How to Securely Store Social Security Numbers
Do you need to store sensitive data like a social security number? In this post, we’ll go over your options for storing this kind of sensitive data, the pros and cons of each approach, and all the requirements and features you should be aware of before tackling this problem.
Imagine that you’re working at a fintech company that needs to collect your customers' social security numbers (SSNs) in order to run credit checks. Your engineering team has been tasked with figuring out how to securely store these SSNs. How do you solve this problem?
Storing sensitive information like a customer’s SSN is best avoided, but there are times when sensitive information is required to complete mission-critical workflows like customer credit checks. This isn’t a situation where you, as an engineer, can afford to guess at what feels “secure enough”. It’s important that you take every step to protect your customers’ data privacy, so you can meet regulatory requirements and avoid potential data breaches while using sensitive data in your applications and workflows.
With that in mind, let’s take a look at our requirements.
Storing SSNs: What Are Our Requirements?
In the fintech scenario described above, we know we need to use SSNs, but what are our other requirements?
We can safely assume the following essential, P0 requirements:
- Searchability: To check for duplicates, SSNs need to be searchable but not viewable.
- Last Four Digits: The last four digits of an SSNs need to be viewable by customers and support teams.
- Third Party Readability: The plaintext value of an SSN must be readable by the third party credit checking service.
- Access Control: No person or service, outside of the credit checking service, should be able to access the encrypted or plaintext values of an SSN. Access should only be available from known servers.
What Are the Best Secure Storage Options?
Let’s begin by exploring our options for storing SSNs. On one extreme of the spectrum of security and privacy, we could choose to not store SSNs at all; and on the other, we could store the data as plaintext. Neither of these are reasonable options. We know that we need sensitive data for our business to function and although simply treating it as non-sensitive and storing it as plaintext yields maximum utility, this would put our customers’ data at risk.
This leaves us with a couple of options: We could hash the data, or we could encrypt it.
Storage Option 1: Hashing
Our first option is to use a strong one-way hashing algorithm like SHA-512 to make the data unreadable and then only store the hashed value in the database. We should use a secret “salt value” (or key) as part of the hashing construction. SSNs have a total length of 9 digits, making the maximum search space 10^9 (or less, if you exclude non-existent SSNs). If an attacker knows which hashing algorithm you are using, and gained access to the stored customer data, they’d be able to brute-force compute all of the plaintext SSNs if you didn’t use a secret salt value.
Introducing a salt value helps to protect against a brute-force attack, but it does introduce a new problem. You need to securely store the salt. Assuming you are able to secure the salt value, let’s see if this strategy satisfies our requirements.
Searchability
Since the hashing algorithm is deterministic, the same SSN hashed repeatedly will yield the same value, making our dataset searchable.
Last Four Digits
We can’t get the last four digits of the SSN from the hashed value. However, as a workaround, we could store the last four digits as plaintext (or even encrypted) in a separate column. We can use this column to display the last four digits to our customers and support agents.
Third Party Readability
Unfortunately, we can’t satisfy this requirement with hashing. Once we hash the values, we can’t retrieve the original text.
Access Control
This requirement is partially satisfied since no one can access the plaintext value, but we can’t give access to any user or service.
Given the limitations and downsides of hashing, let’s take a look at our second option, encryption.
Storage Option 2: Encryption
We need to make the SSNs unreadable, but still searchable, so we could use a deterministic symmetric encryption algorithm like AES-SIV. Even if an attacker knows which algorithm we are using, they wouldn’t be able to use a brute-force attack as long as our encryption key stays secret.
Searchability
With deterministic encryption, we can index the column and satisfy our searchability requirement.
Last Four Digits
To get the last four digits, we could decrypt a stored SSN and return the last four digits. A better (and more secure) approach is to store the last four digits in a second column, so that we don’t have to decrypt the entire number just to retrieve the last four digits.
Third Party Readability
Before sending data to a third party credit check service, we can decrypt the SSN. We’ll then need to encrypt the SSN during transit using a secure TLS or mTLS network connection.
Access Control
Encryption alone doesn’t address the access control requirement. Most databases allow you to control table-level access to specific accounts. You could grant access only to a special account used to retrieve the encrypted SSNs prior to sending them to a third party credit checking service. Additionally, you’d want to lock down network access so that even if the credentials were leaked, they could only be used from known servers from within your network.
Other Things to Consider
If we combine strong deterministic encryption, harden the network, and lock down table-level access, we have the start of a workable solution. However, we still need to securely store the encryption key.
Secure key storage gets complicated and a full deep-dive on solving this problem is outside of the scope of this post. You could split the encryption key and store it in different ways across different systems, making it harder for a single point of failure to result in a key leak. Additionally, there are cloud-based key managers that you could use.
Ideally, you’ll rotate the credentials used to access the sensitive data on a regular basis. You’ll also need to build robust logging to track when records are accessed, creating an audit trail. In the event of a data breach, you need to have a system in place to rotate your encryption key, re-encrypt all data, and rotate access credentials. Finally, to prevent a potential ransomware attack, you need to be able to recover the original data even if someone gains access and encrypts everything with a key you don’t know.
As you can see, the requirements to build a secure, holistic solution for storing and using SSNs quickly balloon into a large undertaking. You could eventually end up spending more time designing and implementing a secure SSN storage solution than working on your company’s core applications.
So, what can you do?
Safely Storing Social Security Numbers with Skyflow
With Skyflow’s PII Data Privacy Vault, you can house sensitive data like customer SSNs within an encrypted zero trust vault. Encryption keys are managed and rotated automatically or programmatically based on your preferences. Alternatively, you can bring your own key or connect your Skyflow vault directly to Amazon’s Key Management System.
Skyflow’s vault is purpose-built for storing sensitive data and supports built-in data types for common user PII types like SSNs, giving you robust encryption, redaction, masking, and tokenization out-of-the-box. These column-based preferences let you control the readability of sensitive data, from fully-redacted to plaintext.
Access to tables, columns, and rows is controlled by Skyflow’s Data Governance Engine. This lets you control who sees what, and the format in which they see it. For example, you could create a policy where only California-based customer support agents can see the last four digits of a customer’s SSN if the customer is also based in California, while customer support agents located elsewhere see a fully-redacted value. For more details and examples, check out our documentation.
In a scenario where you need to communicate with a third party, like running credit checks, Skyflow Connections lets you call the third party service without the plaintext value ever hitting your servers.
These features let you safely and securely store SSNs and many other types of sensitive data, and can be integrated into your core application through a simple REST API.
Final Thoughts
In this article, we presented different options for securely storing and using SSNs. Through a combination of encryption and a few other security measures, we can begin to tackle this problem. However, to have a holistic solution for storing and using SSNs with features like key rotation, backups, and logging the complexity quickly starts to grow into something that takes you away from spending time developing the other essential features of your application.
We created Skyflow to help solve problems like this. With Skyflow, you get a complete and robust solution out-of-the-box that you can easily integrate with your infrastructure using APIs or SDKs, so you can get back to building your product. You can also use Skyflow to protect sensitive data in Large Language Models (LLMs), so you can harness the potential of generative AI without sacrificing data privacy.
Sign up for our Quickstart environment and give it a try for yourself.