How to Securely Store Social Security Numbers
Does your company need to store sensitive data like a social security number? In this post, we’ll go over methods for storing this kind of sensitive data, the pros and cons of each approach, and all the requirements to be aware of before tackling this problem.
Does your company need to store sensitive data like a social security number? In this post, we’ll go over methods for storing this kind of sensitive data, the pros and cons of each approach, and all the requirements to be aware of before tackling this problem.
The first answer to the question “how best to securely store social security numbers (SSNs)” should always be: Don’t store them.
But for many companies, like health care, banking, and fintech companies, they need to collect their customers' SSNs. And those engineering teams have been tasked with figuring out how to securely store these SSNs.
We’ll say it again: Storing sensitive information like a customer’s SSN is best avoided (see: UnitedHealth confirms 190 million Americans affected by Change Healthcare breach), but there are times when sensitive information is required to complete mission-critical workflows like customer credit checks.
This isn’t a situation where the engineer can afford to guess at what feels “secure enough” (see: The Slow-Burn Nightmare of the National Public Data Breach). It’s important that you take every step to protect your customers’ data privacy, so you can meet regulatory requirements and avoid potential data breaches while using sensitive data in your applications and workflows.
With that in mind, let’s take a look at our requirements.
Storing SSNs: What Are the Requirements?
In the fintech scenario described above, we know we need to use SSNs, but what are our other requirements?
We can safely assume the following essential, P0 requirements:
- Searchability: To check for duplicates, SSNs need to be searchable but not viewable.
- Last Four Digits: The last four digits of an SSNs need to be viewable by customers and support teams.
- Third Party Readability: The plaintext value of an SSN must be readable by the third party credit checking service.
- Access Control: No person or service, outside of the credit checking service, should be able to access the encrypted or plaintext values of an SSN. Access should only be available from known servers.
What Are the Best Secure Storage Options?
Let’s begin by exploring our options for storing SSNs. On one extreme of the spectrum of security and privacy, we could choose to not store SSNs at all; and on the other, we could store the data as plaintext. Neither of these are reasonable options. We know that we need sensitive data for our business to function and although simply treating it as non-sensitive and storing it as plaintext yields maximum utility, this would put our customers’ data at risk.
This leaves us with a couple of options: We could hash the data, or we could encrypt it.
Hashing SSNs with SHA-512 and salts
Our first option is to use a strong one-way hashing algorithm like SHA-512 to make the data unreadable and then only store the hashed value in the database. We should use a secret “salt value” (or key) as part of the hashing construction. SSNs have a total length of 9 digits, making the maximum search space 10^9 (or less, if you exclude non-existent SSNs). If an attacker knows which hashing algorithm you are using, and gained access to the stored customer data, they’d be able to brute-force compute all of the plaintext SSNs if you didn’t use a secret salt value.
Introducing a salt value helps to protect against a brute-force attack, but it does introduce a new problem. You need to securely store the salt. Assuming you are able to secure the salt value, let’s see if this strategy satisfies our requirements.
Searchability
Since the hashing algorithm is deterministic, the same SSN hashed repeatedly will yield the same value, making our dataset searchable.
Last Four Digits
We can’t get the last four digits of the SSN from the hashed value. However, as a workaround, we could store the last four digits as plaintext (or even encrypted) in a separate column. We can use this column to display the last four digits to our customers and support agents.
Third Party Readability
Unfortunately, we can’t satisfy this requirement with hashing. Once we hash the values, we can’t retrieve the original text.
Access Control
This requirement is partially satisfied since no one can access the plaintext value, but we can’t give access to any user or service.
Given the limitations and downsides of hashing, let’s take a look at our second option, encryption.
Encryption of SSNs with AES-SIV (maintains searchability)
We need to make the SSNs unreadable, but still searchable, so we could use a deterministic symmetric encryption algorithm like AES-SIV. Even if an attacker knows which algorithm we are using, they wouldn’t be able to use a brute-force attack as long as our encryption key stays secret.
Searchability
With deterministic encryption, we can index the column and satisfy our searchability requirement.
Last Four Digits
To get the last four digits, we could decrypt a stored SSN and return the last four digits. A better (and more secure) approach is to store the last four digits in a second column, so that we don’t have to decrypt the entire number just to retrieve the last four digits.
Third Party Readability
Before sending data to a third party credit check service, we can decrypt the SSN. We’ll then need to encrypt the SSN during transit using a secure TLS or mTLS network connection.
Access Control
Encryption alone doesn’t address the access control requirement. Most databases allow you to control table-level access to specific accounts. You could grant access only to a special account used to retrieve the encrypted SSNs prior to sending them to a third party credit checking service. Additionally, you’d want to lock down network access so that even if the credentials were leaked, they could only be used from known servers from within your network.
Other Things to Consider
If we combine strong deterministic encryption, harden the network, and lock down table-level access, we have the start of a workable solution. However, we still need to securely store the encryption key.
Secure key storage gets complicated and a full deep-dive on solving this problem is outside of the scope of this post (you can read more at Encryption Key Management and its Role in Modern Data Privacy). You could split the encryption key and store it in different ways across different systems, making it harder for a single point of failure to result in a key leak. There are also cloud-based key managers that you could use.
Ideally, you’ll rotate the credentials used to access the sensitive data on a regular basis. You’ll also need to build robust logging to track when records are accessed, creating an audit trail. In the event of a data breach, you need to have a system in place to rotate your encryption key, re-encrypt all data, and rotate access credentials. Finally, to prevent a potential ransomware attack, you need to be able to recover the original data even if someone gains access and encrypts everything with a key you don’t know.
The requirements to build a secure, holistic solution for storing and using SSNs quickly balloon. You could eventually end up spending more time designing and implementing a secure SSN storage solution than working on your company’s core applications.
So, what can you do?
Safely Store SSNs with a Data Privacy Vault
With Skyflow’s PII Data Privacy Vault, you can house sensitive data like customer SSNs within an encrypted zero trust vault. Encryption keys are managed and rotated automatically or programmatically based on your preferences. Alternatively, you can bring your own key or connect your Skyflow vault directly to Amazon’s Key Management System.
Skyflow’s vault is purpose-built for storing sensitive data and supports built-in data types for common user PII types like SSNs, giving you robust encryption, redaction, masking, and tokenization out-of-the-box. These column-based preferences let you control the readability of sensitive data, from fully-redacted to plaintext.

Access to tables, columns, and rows is controlled by Skyflow’s Data Governance Engine. This lets you control who sees what, and the format in which they see it. For example, you could create a policy where only California-based customer support agents can see the last four digits of a customer’s SSN if the customer is also based in California, while customer support agents located elsewhere see a fully-redacted value.
>> For more details and examples, check out our data governance documentation.
In a scenario where you need to communicate with a third party, like running credit checks, Skyflow Connections lets you call the third party service without the plaintext value ever hitting your servers.
These features let you safely and securely store SSNs and many other types of sensitive data, and can be integrated into your core application through a simple REST API.
Final Thoughts
By securely storing SSNs with encryption in combination with a few other security measures, we can begin to tackle this problem. However, to have a holistic solution for storing and using SSNs with features like key rotation, backups, and logging, the complexity quickly starts to grow into something that takes engineering teams away from their core responsibilities.
We created Skyflow to help solve problems like this. Skyflow provides a complete and robust solution out-of-the-box that can integrate with existing infrastructure using APIs or SDKs. Skyflow can also be used to protect sensitive data in Large Language Models (LLMs), unlocking the potential of generative AI without sacrificing data privacy.