How-To: Generate Personal Data with Randomkey

Randomkey.io
7 min readMar 21, 2020

Time to get hands-on with the newly released fictional data generators

There is a new cool kid in the API town

Randomkey is a European maker of two APIs: a Random Data Generator and a Test Data Generator. The services support static data masking use cases by producing fictional personal data on demand. The former yields random values per request, while the latter generates random data that is consistent per request to achieve referential integrity.

This how-to is a first look on both APIs. The tutorial covers generating your own authorization key, generating a sample Random Data API request, and translating an input to a random — but unique — output with the Test Data API.

Prerequisites

If you’d like to follow this guide, make sure to have the below requirements in place.

  • You’ll need a REST Client. If you don’t have one already, go get Postman
  • You need to register with Randomkey (only your email address is required). Upon the registration you’re allocated 1000 free requests to the API.

Go to https://randomkey.io/random-key to generate your authorization key for the app. Without the key, Randomkey won’t serve your requests or create the data structures for your user profile.

Randomkey requires your email address to unravel its full capabilities

Randomkey will send the authorization token to your email address. You can now start using the app!

A sample post-registration email

Sending your first request

Let’s try out the app and make sure you can connect to Randomkey API. Startup Postman or your REST Client of choice, and create a new POST request. For a start we will generate a random location in the US. Set the URL to the Randomkey’s location endpoint, https://random.api.randomkey.io/v1/location and fill out the request headers by providing your authentication token, and setting the Content-Type to application/json. In the request body, you need to specify the number of records to return, and the region. Take a look:

URL: https://random.api.randomkey.io/v1/location

HEADERS
auth: eb3337dc23f1fca33ceb90bfd7f2450d
Content-Type: application/json

BODY
{
“region”: “us”,
“records”: 1
}

The request is ready to be sent

The Random Data API does not require any further settings, so go ahead and hit Send.

New York, New York was returned by the API

Success! The app returned a random incorporated location in the US, its name and the state it belongs to, along with a ZIP code associated with the location. Send the request again, and another place will be generated in return.

Degrees of randomness

Now that we are all set and comfortable with the app, a few words about the Randomkey’s API family. The endpoint called in the first example is an instance of Randomkey’s Random Data API. Others include first name (female/male), last name, national insurance number, social security number, various date and numeric endpoints. The mechanism is the same for all supported methods: call the chosen RK’s endpoint and you will get a random data point within the requested class. Call the service for male first names in the US region and you will get a ‘Mark’ or a ‘Khaliyl’, either a single representative of the 60,000 names available for the endpoint. The numeric and the date endpoints require some more information, usually the range you’d like the point to fall within (a date example is covered in the next section).

Sample French first name returned by the API

The Test Data API is Random Data’s more elaborate sibling. You can think of the API as a data translator: you send it an input, and in return you get another value. Let’s take a US male name example again. In my case, sending ‘Daniel’ will yield ‘Ejay’ in the response. Should I hit Send again, the response will remain unchanged: RK guarantees you will always get the same value for the same input. This is to ensure that all your applications connecting to Randomkey receive a uniform response and maintain referential integrity between the data sets. What’s more, your authorization key will decide the translations: these are uniquely (and — yes! — randomly) generated for every registered user. Most likely, your ‘Daniel’ will bring a different name in response (there is a 1 in 60,000 chance that you will also see Ejay, though).

Sample output of the Test Data API

The choice between Random Data and Test Data API boils down to whether you need to keep the referential integrity of your masked data, or whether you’re happy with it being purely coincidental. The Random Data service will always return a different response, while the Test Data generator will produce a consistent output for the same input. Read on to discover some more examples of both APIs.

More random data

Some endpoints of the Random Data API can simply be called to return a value, but some will require user input. Specifically, the time and date, and numeric services. It makes sense: perhaps your application only ever accepts users born before 2002, and generating younger users will violate the checks set on the database. Similarly, years before 1900 are rarely used within the application and might break things.

A sample request to the date endpoint might carry the following data:

URL: https://random.api.randomkey.io/v1/location

HEADERS
auth: eb3337dc23f1fca33ceb90bfd7f2450d
Content-Type: application/json

BODY
{
“min”: “10-Jul-2010”,
“max”: “10-Aug-2012”,
“format”: “%d-%b-%Y”,
“records”: 1
}

Sample date generated by the Random Data API

Note how the format provided has to match the input data and will decide the format of the returned date. The date format codes follow that of the Python specification. Here %d stands for day with a leading zero (e.g. the first day of the month is written as 01), %b is a 3-letter abbreviation denoting a month, and %Y is a 4-digit year. You are also asked to specify the delimiter between each element.

Achieving a realistic random

Just as with the date example, most likely you’d like your data random — but not too random to keep your test apps reasonably realistic. The randomness can be controlled by specifying the range or the region of the target value.

The personal information such as names, locations, and document numbers are tied to regional endpoints. Hence, if your users are mostly French or Polish you will not populate the testing with North American names that often lack the special characters known for alphabets all over the world.

Currently, only the US and UK regional endpoints are supported

Achieving referential integrity

Sometimes random is required within the data, but consistency is a must between the requests. In other words: we might want to anonymise our data set in a uniform way across the organisation. For instance, Austin, Texas should consistently translate to another, random location: like Minneapolis, Minnesota. Or a certain Social Security Number should always yield the same equivalent (as much as we don’t care what that equivalent is). This is useful specifically for applications that combine data from multiple sources and care for integrity checks.

RK’s Test Data generator has been built for that purpose. Every user generates their own data tables so that the randomness per account is assured and RK users don’t share their data sets.

As an example, let’s imagine you need to generate an alternative set of Social Security Numbers, since original numbers should never ever be used for testing. You need those SSNs to match the SSA specification and to pass the database check. The numbers also have to match across the tables to verify the integrity within the database.

Post a test SSN to the app to return its assigned random value:

URL: https://random.api.randomkey.io/v1/id/ssn

HEADERS
auth: eb3337dc23f1fca33ceb90bfd7f2450d
Content-Type: application/json

BODY
{
“id”: “741965201”
}

Test Data API will always produce the same value for the same input

Every time you hit Send, the same output will be generated.

The processed can be followed for any other Test Data API endpoint, be it names, locations, or dates. See for yourself how easy it is to generate the US location dataset:

  1. Send a POST request to https://test.api.randomkey.io/v1/location
    Provide a sample location in the request body, your own address (RK hashes your data — it is never stored in the clear) or use a sample location, such as Louis Armstrong Park in New Orleans, Lousiana:
    {
    “city”: “New Orleans”,
    “state”: “Louisiana”,
    “zip”: “70116”,
    “region”: ”us”
    }
  2. Receive the location from the API. In my case I got Long Lake, Minnesota:
Sample translation of a location with Test Data API

Note how as soon as we misspell the city name as “New Orlens”, the output of the function changes:

A misspelled input will yield a different output

Summary

Randomkey let’s you choose: keep your data fully random, or random to an extent: by manipulating the range of returned values, their region, or setting up consistency per request so that a certain data point always translates to a set equivalent. Use RK’s capabilities to your benefit by adjusting the extend of randomness.

We look forward to your feedback! Let us know in the comments how you found your experience with the app and what we can improve on. Randomkey is an indie bootstrapped project and welcomes all community engagement. Thank you for reading!

--

--

Randomkey.io

We are the team behind Randomkey, a developer’s toolkit for data privacy.