What Is a Data Privacy Vault?
A data privacy vault is a technology that isolates, secures, and tightly controls access to manage, monitor, and use sensitive data. In this post, I’ll provide a deep dive into how a data privacy vault helps you to ensure the privacy of sensitive data without sacrificing data utility.
Sensitive data is essential to nearly every modern business, regardless of whether your company uses healthcare data (PHI), payment card data (PCI), or other personally identifiable information (PII) about your customers.
From the largest enterprise to the smallest startup, businesses use sensitive data for essential functions like analytics, processing transactions, and ID verification. Sensitive data is a valuable asset that companies need to use for critical workflows, but they also need to secure it from theft or misuse to protect their reputations and maintain compliance with laws, regulations, and industry standards like PCI DSS.
Whether it’s a name, a credit card, or an email address, you need to protect sensitive customer data to maintain customer trust. You also need to protect sensitive data to comply with various regulations and laws in the US and internationally, including GDPR, CCPA, HIPAA, PCI DSS, and many others.
Data privacy laws and regulations vary widely, but all require that sensitive data is only used for approved purposes and that any data breaches are handled appropriately (generally at a high cost to the business suffering the breach).
But wait, wasn’t data supposed to be “the new oil?”
If Data is the New Oil, a Cleanup Is Needed
You’ve probably heard the phrase “data is the new oil”, and this idea does capture certain aspects of the value of sensitive data: that it’s valuable, and that your business probably can’t function without it. But it also loses some of the nuances of sensitive data: while it can be an asset, it’s also a liability – just ask one of the dozens of organizations that have experienced a data breach this year.
So, if we want to think of data as similar to oil, part of this imperfect metaphor should include consideration of how to avoid “leaks” – just like how petroleum engineers seek to avoid leaks in tanker ships and other oil infrastructure. And when it comes to sensitive data, cleanup is urgently needed in many organizations to prevent sensitive data from “leaking” (i.e., being used for unauthorized purposes).
So how can you manage the liability of handling sensitive data, while using it as an asset to drive important workflows like analytics? The solution is to treat sensitive data differently from the rest of the data that your business handles by centralizing it in a data privacy vault.
This approach was pioneered by companies like Apple and Netflix to handle their sensitive customer data. but as we’ll see below, the requirements for a data privacy vault are complex and go beyond what most organizations have the resources or expertise to build for themselves.
What is a Data Privacy Vault?
Above, I define a data privacy vault as “a technology that isolates, secures, and tightly controls access to manage, monitor, and use sensitive data”. There’s a lot to unpack in this definition, so let’s take a closer look at each of these aspects of a data privacy vault:
- Isolate: When you isolate (or centralize) sensitive data in a data privacy vault, you avoid one of the major issues with data security: sensitive data sprawl. Data sprawl occurs when sensitive data like names or social security numbers are replicated from one system to another, increasing the amount of infrastructure that’s impacted by regulatory compliance and increasing the attack surface area for malicious hackers to exploit. At its worst, data sprawl allows any user or service with access to any part of your infrastructure to access sensitive data.
- Secure: To secure sensitive data, you need a combination of encryption and tokenization. Encryption protects the sensitive data that’s isolated in a data privacy vault, while tokenization allows you to provide stand-in “tokens” that correspond to this sensitive data and that can be used throughout your infrastructure because tokens have no exploitable value. Using tokenization this way secures sensitive data because it helps to eliminate sensitive data sprawl.
- Tightly control access: With sensitive data isolated in a data privacy vault, access is controlled using a combination of zero trust architecture and role-based and account-based access controls (RBAC and ABAC). These access controls ensure that only the minimum amount of data that’s required for business-critical workflows are available to your users and services.
- Manage: Managing sensitive data means that you have control over what type of data you protect in your data privacy vault, and also how that data is displayed. For example, you could configure your vault to only store two types of data (such as name and date of birth), and you could configure these types of data to be partially redacted. This means that if you only need the year of birth for your workflows, that’s the only part of the date of birth field that’s available to your workflows.
- Monitor: To monitor how data is used you need a robust audit logging capability. Audit logs let you examine how sensitive data is used – which data elements, by which users or services, and at what time – so you can detect the misuse of sensitive data and catch any suspicious activity early.
- Use: The ability to use data while protecting it is essential. A well-designed data privacy vault lets you access and update sensitive data elements as needed for authorized workflows. It also lets you run operations like comparison or exact match on sensitive data without decrypting it, so you can complete workflows like credit checks or ID verification while leaving data safely encrypted. A data privacy vault also lets you safely de-identify datasets for analytics, so you can make data-driven business decisions without exposing valuable PII, PHI, or PCI to all users of your data warehouse.
These capabilities are extensive, and complex to build and test. While some organizations could build these capabilities in-house, most will benefit more from focusing on their core products and services and buying a data privacy vault, following the maxim “never write code, unless you must”.
Now that we’ve looked at the features of an effective data privacy vault, we’ll look at how these capabilities help you to meet the challenges of protecting data privacy.
What Are the Challenges of Protecting Data Privacy?
If you collect and store sensitive customer data, you have a number of challenges to consider:
- Compliance: Failing a compliance requirement can lead to a loss of business, cause you to incur fines, and result in bad press. A data privacy vault makes compliance with laws and regulations like GDPR, CCPA, HIPAA, and PCI DSS easier than ever.
- Data Security: Storing valuable customer data makes you a target for malicious hackers. A data privacy vault makes the sensitive data sought by hackers much harder to access because it’s isolated in a zero trust data privacy vault rather than replicated across your infrastructure and systems.
- Data Residency: Many governments have restrictions on where PII and PHI can reside, as codified in laws like LGPD (Brazil) and DCIA (Canada). A well-designed data privacy vault lets you not only centralize PII but also restrict it to a specific region
- Secure Analytics: You need to use data to help drive business decisions, without giving your analysts unrestricted access to PII. Using the tokenization capabilities of a data privacy vault with de-identification, you can both enable analytics and protect sensitive data.
- Data Usability and Secure Data Sharing: You want to make sure that sensitive data is usable across organizational silos while respecting data privacy, so you need granular access control policies. And in some cases, you might need to share sensitive data with other trusted third parties, so you’ll need APIs that allow for secure third-party connections.
How a Data Privacy Vault fits Your Data Protection Strategy
An effective data protection strategy should have the following components, each of which is enabled by the features of a well-designed data privacy vault:
- Identify Sensitive Data: Establish which data elements have the highest value and require special treatment from the standpoint of security, privacy, and compliance risk.
- Isolate: Remove sensitive data from your infrastructure and systems and isolate it in a data privacy vault, replacing sensitive data elements outside of the vault with tokens. Then, use tokenization to allow authorized users and services to gain access to sensitive data elements as needed, subject to strict zero trust controls that you configure in your data privacy vault.
- Protect: Store sensitive data in a vault that respects data residency requirements and that utilizes sophisticated access controls and encryption techniques to protect the sensitive data that it contains.
- Harness: An effective data privacy vault allows authorized users and services to continue to work with sensitive data as needed to support critical business workflows. And, it provides audit logs that track which users or services access which sensitive data elements, so you can verify that sensitive data isn’t being used by unauthorized parties or for unapproved purposes.
Where Does a Data Privacy Vault Fit in Your Stack?
Consider the infrastructure diagram below. Data is collected in the frontend application and sent to downstream services and storage through the API Gateway.
As discussed above, this type of infrastructure becomes a compliance headache and a data security nightmare when PII like someone’s phone number is collected and pushed downstream where it ends up in your logs, database, data warehouse, and beyond. You can see how storing a phone number outside of a data privacy vault contributes to sensitive data sprawl below:
When you use a data privacy vault, the vault becomes your single source of truth for all PII. This effectively de-scopes your existing infrastructure from the responsibilities of data privacy, data security, and compliance.
Let’s take a closer look at how this works, expanding on the previous example. In the modified infrastructure shown below, sensitive data is sent to the vault whenever the application frontend collects that data. The vault exchanges sensitive data for tokens, which you can safely store downstream and use for analytics and machine learning, as follows:
In situations where you need to give trusted third party services access to sensitive data, the vault can pass this data to the trusted third party in a secure way.
For example, let’s say that you need to send an SMS to a customer via Twilio. To do this, you would start by isolating that customer’s phone number in your vault and storing a token representing that number instead of storing a plaintext phone number outside of your vault.
To send an SMS to a customer, you call the vault with the phone number token, and the vault automatically passes the corresponding plaintext phone number to Twilio’s SMS API. Then, the vault relays Twilio’s API response back to you. This accomplishes the same thing as storing a plaintext phone number in a database and then calling Twilio’s SMS API directly but provides a much higher degree of data privacy and security.
You can see how this works in the following architecture diagram:
When it comes to data privacy, an ideal solution de-identifies sensitive data as early in the data lifecycle as possible and then re-identifies it as late as possible. The Data Privacy Vault architecture makes it much easier to implement an ideal data privacy solution with the simplicity of a few API calls.
Conclusion
Every company that handles sensitive data needs a data protection strategy, and the right tools to execute that strategy. To learn more about how Skyflow Data Privacy Vault can give your organization and your customers best-of-breed data privacy, check out our white paper, What Is a Data Privacy Vault? Why Do You Need It? or sign up to try Skyflow.