Keeping the PII data characteristics of the test data sets with Random Key

Randomkey.io
3 min readOct 27, 2020

--

There is a whole meta-world beyond strings and numbers

Generating fake PII data is not an easy task: your standard random data generators would usually support strings, numbers, dates, and — the specialized ones— some US-centric data types, such as names, locations, and Social Security Numbers. Detail-orientation is not the forte of mock data applications as they tend to focus on the technical view of the data rather than what that data represents. If you’re a non-US company, or if you cater to customers across the globe, creating data sets for testing that realistically represent your customer base usually translates to a corporate DYI project, as the commercial solutions lack the detail you require. Random Key is here to power such PII data generation initiatives.

Random Key provides 2 REST APIs: Random Data API and Mock API.

An example of a Random Data API request and response

The Random Data API is great for batch generation of data of a specific type: for example, addresses in Italy, or National Insurance Numbers in the UK. The API can generate up to 10,000 data points per request.

An example of a Mock Data API request and response

The Mock Data API has more built-in logic: it will base its response on the JSON schema you send to it. Are you after generating whole user profiles that contain elements such as names, addresses, dates of birth, credit card numbers, and more identifiers, in a way that is specific to your application? Then the Mock API is the way to go. It supports any level of complexity: various levels of nesting, arrays, nulls, or even generating data from the sets you provide to it. It all works without any interface and requires no installation on your side: it’s a simple REST request and response architecture.

Whichever of the APIs suits your use case better, both provide similar choice of data types that you can choose to produce. On the regional side of things, the current (as of October 2020) list of supported countries include France, Germany, the Netherlands, Belgium, Italy, UK, US, and Russia. You can generate names (first and last, or full; gender-specific), addresses (real locations, including the street and building number, city, and any appropriate administrative units), and phone numbers (with or without the trunk, IDD code, and the country prefix). The data, while random, is always realistic as it preserves the characteristics of the original data. To give a few examples: the Spanish mobile phone numbers will always have 9 digits; the National Insurance Numbers will only run within the ranges approved by the HMRC specification; the Credit Card Numbers will include a valid Luhn check digit (until you specify otherwise). As suggested by the Credit Card Number, there are also other, non-regional, endpoints to choose from.

Generating realistic, yet fake data, with that level of detail, is a requisite in data privacy applications, in which the PII fields need to be obfuscated without decreasing the analytical value of the dataset to enable data monetization or transfer.

Registering with Random Key is as simple as sending your email address to the /register URL, or requesting the authentication key via our website. You can follow any of our tutorials to get started: Generate Test Data with Postman and Random Key or Create Fictional Customer Datasets with Python & Random Key. By default you’re given a 1000 free requests; you can get another +10K requests if you fill out our mini-survey.

Random Key is currently investing in the App’s future: as testified by our Twitter account we’re constantly releasing new endpoints, regions, and data types to fill all use cases that should be handled by our service. Simultaneously, we are looking into a Database product for static data masking and a command-line utility for random data generation. If your company is interested in those capabilities and would be willing to share their requirement specification, we are looking for the early adopters of those new products.

As a final note, if you like our API but would rather to get a direct access to our building datasets, consider acquiring them from our website.

--

--

Randomkey.io
Randomkey.io

Written by Randomkey.io

We are the team behind Randomkey, a developer’s toolkit for data privacy.

No responses yet