Data breaches and ransomware attacks regularly expose sensitive customer or business data. Organizations risk compliance penalties, legal costs, and a loss of customer trust from these incidents. With so much at stake, CISOs and data governance teams need to figure out how to best protect sensitive data from prying adversaries.
Over the course of the lifecycle of data, organizations have a number of options to keep sensitive information safe and private. Encryption, tokenization, and data masking are three of the most commonly used methods to protect data. This article aims to explain data masking, tokenization, and encryption, their respective use cases, along with a recommendation for the best choice that most consistently achieves the goal — securing your sensitive data.
Data masking conceals sensitive information in a dataset or data source by modifying the data so that it is of no value to unauthorized intruders or malicious insiders. Typically, data masking works by substituting the sensitive information in a data source with a different value that appears to be authentic. There are other methods to mask data, such as shuffling around the values in a column of data or applying variance to date and number fields.
Protecting data in non-production environments
By far the most common use case for data masking is to protect sensitive information in non-production environments. Overexposing personally identifiable information (PII) or protected healthcare information (PHI) is not a good idea for compliance. Regulations mandate that sensitive customer data is protected at all times.
By copying data and obfuscating information while retaining its properties, data masking enables test, dev, or QA teams to carry out their normal operations without letting them see the genuine underlying sensitive data. Since contractors or offshore workers may access these non-production datasets, data masking ensures that even if the data ends up in an unsecured location, nobody gets access to genuine sensitive information.
Combat insider threats
The need to mask data in non-production environments is understandable given the increasing prevalence of insider attacks and breaches. In 2021, prosecutors charged a Ubiquiti developer with stealing data from AWS and Github instances.
A large number of employees may have access to non-production data, including analysts, developers, and testing teams. Data masking reduces data breach risks from either malicious or unintentional activities carried out by insiders.
Tokenization is a data security practice that replaces sensitive data elements with randomly generated unique identifiers (tokens) in the same format as the replaced data element. For example, a genuine 16 digit credit card number gets replaced by a randomly generated 16 digit token. The sensitive information gets securely stored on a token server and matched with a token when needed by a user or application.
Payment processing systems
If someone sees a token during a transaction, the token has no meaning, which preserves the privacy of sensitive data. It’s for this reason that the use of tokens is particularly widespread in payment processing systems.
When a customer pays with a mobile app for an item at a store, the app uses a token, not the customer’s actual credit card details. From the point of sale, the retailer’s acquiring bank checks with the token server whether the token relates to a valid card, and the transaction completes if valid. The token is printed on the receipt and retailers can see it, but since the token is a randomly generated number, it doesn’t compromise any sensitive card information.
Reducing PCI compliance scope
There’s no getting around the fact that regulatory compliance introduces heavy operational and cost burdens for businesses. PCI DSS is a regulation requiring all companies that process, store, or transmit cardholder data to implement specific security controls. Tokenization limits PCI scope by never actually storing cardholder payment data in your systems. Tokens don’t contain actual cardholder information, so the storage and retention controls mandated by regulations for this data aren’t necessary.
Encryption is one of the oldest and strongest ways to protect sensitive information by encoding it into another, non-readable form using an algorithm. Only someone with the correct secret key can access and read the data in its original form.
Unstructured data
Much of the sensitive information generated, collected, and stored by businesses today is found in unstructured data sources. These sources include emails, PDF files, images, Word documents, Excel spreadsheets, and more. Estimates put the percentage of enterprise data that is unstructured at 80%, and this figure continues to increase.
Encryption is a proven method for protecting unstructured data sources and it works at the scale needed to protect sensitive information in these files both at rest and in transit.
External breach prevention
Despite large investments in perimeter security defenses, intruders manage to find a way into networks all the time. The inevitability of attackers getting into networks is such that CISOs are starting to take an “Assume Breach” mentality and looking for solutions that add defense in depth.
Encryption is the ideal solution for securing data even when other defenses get breached. Since attackers don’t have the key, and breaking a strong encryption algorithm is virtually impenetrable with brute force attacks, encryption prevents data breaches when all other systems fail. Encryption achieves its goal while still allowing authorized users to access sensitive data sources.
A properly implemented encryption system provides watertight data security at all times and in all places, both for data at rest and in motion, across its full lifecycle. In a world where there is no shortage of threat actors seeking to gain access to sensitive data at any possible weak point across your environment, proper encryption (i.e. encryption that isn’t tied to user credentials, identity and access mechanisms, or centralized key storage) secures data even if your other defenses and controls get breached. According to NSA whistleblower Edward Snowden, “Properly implemented strong crypto systems are one of the few things that you can rely on”.
Data masking is only suitable for protecting non-production copies of datasets or data sources because it permanently changes the underlying data. You can encrypt entire production data sources or datasets to protect sensitive information while still ensuring authorized parties can access it.
Meanwhile, tokenization has a limited scope of use cases restricted to structured data such as credit cardholder information, bank account numbers, or social security numbers. Furthermore, achieving thorough security in a tokenization system requires end-to-end encryption that protects data in transit. Also, token vaults or servers that store mappings between original data and tokens are attractive targets for cyber attacks.
Closing Thoughts
While data masking and tokenization have their own relevant use cases, encryption is the more all-encompassing strategy for protecting sensitive data at scale. Implementation and usability concerns deter some organizations from adopting encryption even though it’s the strongest data security method available and mandated by a growing number of regulations governing sensitive information. Common concerns range from key management to procuring solutions that work across a hybrid IT environment while not disrupting existing user workflows.
About Us
Atakama is a file-level encryption solution that secures sensitive data files in a simplified way. Instead of entering a password or code, files are automatically encrypted with a unique key that is split into shards and distributed to devices already within the possession of authorized users. This allows authorized users to decrypt files via the Atakama mobile app on their smartphone. Atakama works the same whether files are on-prem or in the cloud so there is no need for different point solutions.
Contact us today to start securing all your sensitive files.