Skyflow Data Types: Where Data Privacy Meets Usability

March 14, 2022

Many workflows only require partial information. You might only need a phone number’s area code or the year from a date of birth to verify a customer’s location or age. But creating a service that can run privacy preserving computation on data that’s only partially decrypted is highly complex. That's why we created over 50 different data types native to the Skyflow Data Privacy Vault: so you can easily make sensitive data safe and useful while preserving privacy.

  • What values should a given field accept, and what validation is required? 
  • How should this data be redacted, by default?
  • How should this data be protected, in terms of tokenization and which operations are allowed without decryption?

Multiply these questions by the number of roles, systems, and use cases in a typical organization, and the complexity becomes staggering. We created more than 50 different Skyflow data types to reduce this complexity and make applying field-level data privacy settings effortless. When you select a Skyflow data type during schema creation, industry-standard privacy and security rules are applied to your data automatically.

What Are Skyflow Data Types?

When designing Skyflow data types, we looked at the most commonly used sensitive data types and created data privacy logic for each of them to support industry-standard workflows. A Skyflow data type dictates how your Data Privacy Vault handles that type of underlying data by default.

Phone numbers are a good example of a Skyflow data type: they are an important piece of PII to protect, but they offer too much business value to lock them away where they can’t ever be accessed. Let’s say that you want customers to see part of a phone number when showing where message alerts or authentication requests are sent, but not the full number  (which could expose sensitive data to an onlooker). Skyflow’s phone number data type is preconfigured with masking to only show the last four digits of a phone number by default. This lets customers confirm which of their numbers are being used, while keeping the full number private.

Each Skyflow data type is set according to the identifiability and sensitivity of the underlying data. So, in addition to configuring input validation and a default tokenization policy, Skyflow data types let you choose which encrypted operations are allowed by default and set default redaction rules.

Settings for the ssn Data Type in Skyflow Studio

Skyflow’s wide range of data types include common types like name, phone_number, and email_address that are used across every business —  as well as others such as ssn, income, and gender that are more specific to certain kinds of businesses, such as insurance.

How Can I Use Skyflow Data Types?

Consider a business that collects a lot of email addresses, and also has to carefully manage SSNs.
To start with, you can use the email_address data type to remove all of the guesswork from validating, using, and storing email addresses. Features of this data type include:

  • Input validation: Only valid email addresses can go into this field, so abc123@gmail.com is accepted, but 123!@gmail.c isn’t, as shown below:
Input Validation Settings for the email_address Data Type
  • Masking: The default redaction rule makes an email address like johndoe@acme.com appear as ***@acme.com by default, so you can reference an email address without having to fully decrypt it first.
  • Encrypted “exact match” support: Skyflow supports certain operations on encrypted data, but only those that make sense. Because it makes no sense to numerically aggregate or compare email addresses, the default logic for this data type is set to only allow “exact match” queries to run on encrypted email fields stored in Skyflow.
  • Format-preserving tokens: Skyflow supports a variety of approaches to tokenization, including the use of format-preserving tokens so that tools that expect a string formatted as an email address can use tokenized values. The email address data type is configured to use format-preserving tokens, so an email address like johndoe@gmail.com would be tokenized into a string like bwe09f@fg7d8.com for storage in your systems.

Similarly, you can use the ssn data type to help you manage SSNs:

  • Input validation: Only values in the correct format (i.e., 123-12-1234) are accepted.
  • Masking: Only the last four digits are shown by default.
  • Encrypted “exact match” support: The only operation permitted on encrypted SSNs is exact match.
  • Format-preserving tokens: This data type also uses format-preserving tokens, so a real SSN would be tokenized into a string like 456-78-9876 for storage in your systems.

What If the Skyflow Data Types Aren’t Enough?

You’re probably wondering what to do if your business uses a unique type of sensitive data or uses a common type of sensitive data in a unique way. Don’t worry, you can either modify an existing Skyflow data type to meet your needs, or start from scratch with a basic type that offers total flexibility.

For example, let’s say that you’re using Skyflow to protect customer data for an online pharmacy. For verification purposes, you need your team to be able to see your customers’ birth month and day from their date_of_birth field. At the same time, you want to keep the number of prescriptions that each customer has on file private, but still use that data for anonymized business analytics.

For customer verification, you can start by modifying the date_of_birth data type. The redaction rule for the existing date_of_birth data type is Redacted, so it isn’t visible by default. To address your use case, all that you need to do is change Redacted to Masked and modify month and day to plain text. It only takes a couple of minutes.

Example of a Modified date_of_birth Data Type

To track prescriptions on file, you can start with a basic numeric field like int32. Then, change the name for your new field to something like total_rxs and make a few other changes to tailor it to your needs. New data types created from the int32 type are already fully-masked, so you’ll just need to configure support for encrypted operations like aggregation, and then choose tokenization settings. In a matter of minutes, you’re ready to track prescriptions by customer while keeping this data private and enabling anonymized aggregate analytics. 

Example of a Skyflow Data Type Created to Track Prescriptions per Customer

Check out our video on Skyflow data types to see how easy it is to create and modify your own data types for sensitive data:

Give Skyflow a Try

At Skyflow, we believe in making the data privacy journey easy. We want to set you down the path of best practices from the very beginning, yet still make the data model completely flexible — so you can bring your own data model and business needs to your Skyflow deployment. You can pick a Skyflow data type to get started, modify an existing data type to your needs, or simply create your own from scratch, so you can give your customers effective data privacy.

Did I mention that after you have picked all the data types you need, you can export them as a Postman collection and start using them right away? It doesn’t get any easier than that.

To learn more about how Skyflow Data Privacy Vault can help you, contact us.


Keep Reading

January 7, 2025

India’s DPDP Rules 2025: Critical Highlights & How to Comply

December 12, 2024

Unlocking Privacy-Preserving AI with Skyflow’s Secure AI Functionality

November 12, 2024

Navigating China’s PIPL Requirements: How to Unlock China Go-to-Market