Store and Securely Process Sensitive Documents in the Cloud

April 7, 2023

Sensitive documents such as contracts or wills, are critically important for any business to protect – not only because they contain sensitive data but also because they can be irreplaceable. So, how can you improve the privacy and security of such documents when storing them in the cloud, without sacrificing the ability to process them?

Software development is not the only process that has benefitted from the migration of on-premise resources to cloud infrastructure. The same sort of convenience that cloud development platforms have provided to engineers benefits consumers as well. Similar to the ease with which a developer can spin up an S3 bucket with AWS, consumers can use the cloud to seamlessly expand their digital storage for documents, photos, and other files. Moreover, they can handle tasks like payments, appointment scheduling, and identity verification without ever stepping into a bank or mailing a document hardcopy.

But, while the cloud offers convenience, remote digital storage is confronted with the same challenges as on-premise physical storage when it comes to handling sensitive documents such as driver’s licenses, passports, and social security cards. Chief among those challenges is how to keep sensitive documents, whether virtual or real, private and secure – yet accessible by authorized parties.

In this post, we’ll look at why sensitive documents require special treatment to preserve privacy and security, and the challenges and risks that companies face when storing and processing sensitive documents in the cloud. We’ll also look at how using data privacy vault APIs addresses these challenges and risks to make it easy to store and securely process sensitive documents in the cloud.

Not All Documents Are Created Equal

Some documents are more important than others. A bad actor can wreak far more havoc with your loan application (which contains sensitive data such as your social security number) than with a receipt for a financial transaction. That’s why you might store a physical loan application (along with a passport, social security card, and birth certificate) somewhere safe – like in a locked filing cabinet. On the other hand, you might store a receipt that you’re saving to file expenses with later in an envelope that’s in plain sight. For a receipt, an envelope in plain sight will suffice, and maybe even serve as a reminder for you to actually fill out and submit that expense report.

Whether locked away in a filing cabinet or stored electronically, sensitive documents need to be accessible when you need them and you probably want different degrees of control over who can get their hands on them. 

Storing documents in the cloud has to address the same needs and concerns. 

Challenges When Storing Sensitive Documents in the Cloud

To understand the challenges of storing virtual digital documents, let’s examine a typical fintech application, like loan issuance at a neobank. Our paper loan application is replaced by an online form containing your passport. 

To support an application that uses electronic documents, the backend relies on three primary services:

User Profile Service

First, you need a user profile service that writes to, updates, and retrieves user data from a database and uses a cloud object storage solution such as AWS S3 to store uploaded profile pictures. Each user row in the database has a column pointing to the S3 location of a profile picture. To ensure even the most basic privacy for each user’s data, corresponding governance policies explicitly grant access between the different cloud resources.

Know Your Client Service

Next, you need a Know Your Client (KYC) service that includes the following services and protocols: Customer Identification Program (CIP), Customer Due Diligence (CDD), and Enhanced Due Diligence (EDD). These three components of KYC are used to verify identity, evaluate risk, and to protect against threats like infiltration and money laundering. 

To comply with KYC, sensitive documents such as driver’s licenses, social security cards, and passports are required to perform all of these checks. The KYC service relies on infrastructure similar to the user profile service. However, because the documents uploaded to the KYC service’s object storage are much more sensitive, this service requires its own set of access control policies. 

Loan Processing Service

Finally, you need a loan processing service that runs on a system much like the KYC service, but with another collection of policies to govern access to equally sensitive data and documents. This service manages and automates loan application and approval workflows like verifying identity and collecting documents such as tax returns and income statements. All of these workflows require the upload of sensitive documents.

The example loan issuance architecture described above looks like the following:

An Example Loan Issuance Systems Architecture Using S3 for Object Storage

So, what are the drawbacks of this architecture?

The Vulnerability of Documents in the Cloud

There are several vulnerabilities to such a loan issuance architecture, but the most significant are data sprawl and decentralized data governance. Data sprawl is an issue because sensitive data is replicated throughout the system, exponentially increasing the surface area of exposure and lapses in access controls. And governance is an issue because the reliance on cloud object storage for document files introduces additional complexity that opens up even more points of risk and liability because you don’t have a single, centralized way to govern sensitive data.

In addition to requiring some sort of index on the location of objects in the S3 buckets within the database tables – which can slow the performance of searches and downloads – more individual pieces of infrastructure means authoring more governance policies requiring more explicit roles, responsibilities, and authorization. All it takes is one inadequate policy to undermine the whole system – and when it comes to the cloud, policies are often notoriously misconfigured.

Moreover, the complexity and redundancy of this architecture creates maintenance and management challenges – and usually developers are stuck relying on a variety of fragmented, cumbersome SDKs to interact with their cloud providers.

An Illustration of How Misconfigured Policies Can Expose a System to Attack

A Better Way to Securely Store Documents in the Cloud

When you consider how you should securely store physical documents, a solution like a locked filing cabinet comes to mind. You can keep all of your important documents in a filing cabinet, neatly organized and protected with a lock and key. You can choose to share its contents, or not. You can do something similar in the cloud with a data privacy vault.

Segmentation is a core tenet of zero-trust architecture that recommends cordoning off more sensitive components. With a data privacy vault, PII and other sensitive information can be isolated from less sensitive data and each service in an application can have its own vault and schemas within one account. 

A data privacy vault can store unstructured data, such as a passport image file, as a column alongside structured data such as a name, date of birth, and email address. Now a single row can correspond to one user and all of the services using this data can index on a column, such as an email address, to join tables. And with encrypted operations from Skyflow, you can perform secure joins without ever decrypting the underlying data.

Better yet, with a user’s most sensitive data isolated in one place, the accompanying access policies can be reduced to just one configuration for structured and unstructured data alike. With Skyflow, you can author policies that control access even down to the row level. This eliminates the need for an additional storage solution like S3 – and this can help reduce overall cloud spending, as well as simplifying governance!

Overall, using a data privacy vault simplifies management of where and how sensitive data is stored, who can access the data, and the maintenance and management of access control policies for this data. Instead of replicating sensitive information across multiple different kinds of storage methods and risking greater exposure to unauthorized use, you lock away the most important data in a vault where it is both easier to govern and easier to safely utilize.

With a data privacy vault, our example neobank can manage the flow of sensitive data in a more simple and streamlined pattern, as follows:

An Loan Issuance Systems Architecture Using a Skyflow Data Privacy Vault

One API to Protect Them All

To download and view a document that's associated with a row in a database table and stored as an object within a system that relies on a cloud storage service such as AWS S3, a developer has to use the cloud provider’s SDK to first retrieve the entry from the database, then parse out the file location column, and then use this path to find and download the file from its bucket.

Skyflow Data Privacy Vault provides an API that enables lightweight and uniform development for sensitive data workflows, and that handles the business logic required to associate an S3 URL with a specific row. With a few simple calls to Skyflow APIs, a loan issuance service can upload and download a sensitive file associated with a specific user.

In this example, we have a vault schema that includes a column for a user’s passport image file. Skyflow Data Privacy Vault natively supports file data types. In other words, a user’s passport is stored as just another column in a row and not just a path to the file in an S3 bucket. This file data type can be redacted or masked, encrypted, and tokenized the same as any other data in the vault and access is controlled through a single set of policies.

Because the file is associated with a vault entry, the unique Skyflow ID assigned to that row is the only attribute needed to upload and download the passport image file.

An Example of a KYC Vault Schema with Image Files

Below is an example of the API call to upload a document:

In order to download a document, you would use the following example API call:

The resulting response would look something like this:

Operate on Documents that Never Leave the Vault

To further reduce the risk of exposure, Skyflow offers secure workflows. With secure workflows, downstream services or infrastructure never have to interact with sensitive data and documents. Skyflow Functions lets developers write code that executes within the secure vault environment. This code can involve extracting and processing sensitive information located in files as well as retrieving and operating on data so only the result of a function is sent to a downstream first- or third-party service.

And in cases where you really do need to securely share unredacted sensitive documents with third party services, you can integrate with those services directly from your vault using Skyflow Connections.

Give Skyflow a Try

Equipped with privacy-preserving features such as encryption, tokenization, masking, and fine-grained data governance, Skyflow empowers startups and enterprises alike to meet privacy requirements and achieve compliance with a simple yet powerful API.

To learn about data privacy vaults and how they can fit into your tech stack, read more from Skyflow’s Field CTO, Manish Ahluwalia.

Keep Reading

No items found.