Mask data in logger with python

Who hasn’t used a logger sometime as a developer, it’s something that always accompanies us since we learn to program, but most of the time we do not give the necessary importance to the information that is shown in a log, in this particular case I’ll talk about JSON objects, which are very commonly used and contain information of various types, for example about a book, users or bank information. In the case of information from a book, you don’t have to think much about what is shown in the log because there is no data that compromises the user, but in the case of banking information such as credit cards or users, you have to be very careful, because you can compromise the security of the company and the user.

Now, today I have brought an example about of how to mask information from a JSON object in python, using two libraries, pydash and jsonpath-ng. The logic is simple and could be transferred to another language with which you want to work.

I’ll start by explaining a little about the libraries, first pydash, it’s a library that is based on lodash, which is a Javascript library that has very varied functionalities such as object validation, creation of compound functions, manipulation of values ​​in structures of data, etc … I don’t want to delve too deeply into the subject, because it’s a very extensive library that has a great variety of utilities. On the other hand, jsonpath-ng which will take care of the heaviest work, as it uses path expressions to navigate through elements within a JSON object; through this we can find the routes in which the data that we want to mask is found, and then replace them.

In this case, we won’t use Python’s print option, we’ll use logging since it has options to configure how to display the information we want to print.

logging.basicConfig(format=’[%(levelname)s] : %(message)s’, level=logging.INFO)

We proceed to define our identifiers, to be able to mask each piece of information in a different way, for example, if we want to mask passwords or api keys with asterisks and credit card information with XXX:

{
'password': 'p****d',
'api_key': 'AH**==',
'cvv': 'XXX'
}

In this way, we create groups of information to treat each group differently from each other; in this case, an enumerator was created to mask credit information, dates, and user information.

from enum import Enum


class MaskTargetsKeys(Enum):
CREDIT_CARD = "creditCard"
CVV = "cvv"
BIN = "bin"
EXP_DATE = "exp_date"
SECRET_KEY = "secret_key"

Later we must define the expressions that we are going to search, to be able to mask the information, in this case we create a class that has the identifier of the type of information and the expression with which we want to perform the search.

class MaskTargets:


def __init__(self, target: str, expression: str):
self.target = target
self.expression = expression
pass

The expressions that are defined allow us to evaluate the object and obtain all the routes that match with the rule that we establish, in this case each expression is designed to be able to find any match with cvv, expiryMonth, expiryMonth, bin, password, secret_key, cardNumber and card.number. The jsonpath-ng library is responsible for searching in the root of the object and in deeper nodes and returning all the places where it found matches.

def __set_masked_keys(self) -> None:

self.__mask_targets = []
credit_card_key = MaskTargetsKeys.CREDIT_CARD.value
cvv_key = MaskTargetsKeys.CVV.value
exp_date_key = MaskTargetsKeys.EXP_DATE.value
bin_key = MaskTargetsKeys.BIN.value
secret_key = MaskTargetsKeys.SECRET_KEY.value
self.__mask_targets.append(MaskTargets(credit_card_key, "$..card.number"))
self.__mask_targets.append(MaskTargets(credit_card_key, "$..cardNumber"))
self.__mask_targets.append(MaskTargets(cvv_key, "$..cvv"))
self.__mask_targets.append(MaskTargets(exp_date_key, "$..expiryMonth"))
self.__mask_targets.append(MaskTargets(exp_date_key, "$..expiryYear"))
self.__mask_targets.append(MaskTargets(bin_key, "$..bin"))
self.__mask_targets.append(MaskTargets(secret_key, "$..password"))
self.__mask_targets.append(MaskTargets(secret_key, "$..secret_key"))

As I had mentioned, we defined an enumerator to be able to group the types of information and their respective expression, in this case we evaluate each item and obtain all the routes in which the expression had a positive result.

def __clear_sensitive_data(self, metadata: dict) -> dict:



for item in self.__mask_targets:

json_path_expression = parse(item.expression)

json_path_expression.find(metadata)

matches = [str(match.full_path) for match

in json_path_expression.find(metadata)]

for match in matches:

target_value = pydash.get(metadata, match, default="")

if item.target == MaskTargetsKeys.CREDIT_CARD.value:

mask_data = CustomLogger.__mask_credit_data(self, target_value)

pydash.set_(metadata, match, mask_data)

if item.target == MaskTargetsKeys.BIN.value:

mask_data = CustomLogger.__mask_bin_data(self, target_value)

pydash.set_(metadata, match, mask_data)

if item.target == MaskTargetsKeys.EXP_DATE.value:

pydash.set_(metadata, match, "XX")

if item.target == MaskTargetsKeys.CVV.value:

pydash.set_(metadata, match, "XXX")

if item.target == MaskTargetsKeys.SECRET_KEY.value:

mask_data = CustomLogger.__mask_secret_data(self, target_value)

pydash.set_(metadata, match, mask_data)



return metadata

Once we have the routes in which we must mask the information, proceed to replace them using pydash, for example, for the information of dates we proceed to replace them with two X, but for user information such as passwords we hide fragments of information using a private method, which replaces some of the information with asterisks.

def __mask_secret_data(self, data: str) -> str:

return data[:2] + '*********' + (data[-1:])

Finally, the information is returned with the sensitive data masked and printed on our console.

[INFO] : Test Data {
"secret_key": "AS*********=",
"cardNumber": "131231XXXXXX4123",
"password": "pa*********4",
"bin": "453634XXX",
"card": {
"number": "424242XXXXXX4242",
"bin": "453634XXX",
"expiryYear": "XX",
"expiryMonth": "XX",
"cvv": "XXX"
}
}

As I mentioned before we can mask each group in the way we want, using different symbols or replacing all the information with a personalized text.

You can see the code of project here: https://github.com/ridouku/custom-logger-pyhton

You can read about lodash here (pydash has the same functionalities): https://lodash.com

Jsonpath syntax information: https://github.com/json-path/JsonPath

Questions? Comments? Contact me at ridouku@gmail.com

Thanks.