How-To: Generate Personal Data with Randomkey
Time to get hands-on with the newly released fictional data generators
Randomkey is a European maker of two APIs: a Random Data Generator and a Test Data Generator. The services support static data masking use cases by producing fictional personal data on demand. The former yields random values per request, while the latter generates random data that is consistent per request to achieve referential integrity.
This how-to is a first look on both APIs. The tutorial covers generating your own authorization key, generating a sample Random Data API request, and translating an input to a random — but unique — output with the Test Data API.
Prerequisites
If you’d like to follow this guide, make sure to have the below requirements in place.
- You’ll need a REST Client. If you don’t have one already, go get Postman
- You need to register with Randomkey (only your email address is required). Upon the registration you’re allocated 1000 free requests to the API.
Go to https://randomkey.io/random-key to generate your authorization key for the app. Without the key, Randomkey won’t serve your requests or create the data structures for your user profile.
Randomkey will send the authorization token to your email address. You can now start using the app!
Sending your first request
Let’s try out the app and make sure you can connect to Randomkey API. Startup Postman or your REST Client of choice, and create a new POST request. For a start we will generate a random location in the US. Set the URL to the Randomkey’s location endpoint, https://random.api.randomkey.io/v1/location and fill out the request headers by providing your authentication token, and setting the Content-Type to application/json. In the request body, you need to specify the number of records to return, and the region. Take a look:
URL: https://random.api.randomkey.io/v1/location
HEADERS
auth: eb3337dc23f1fca33ceb90bfd7f2450d
Content-Type: application/jsonBODY
{
“region”: “us”,
“records”: 1
}
The Random Data API does not require any further settings, so go ahead and hit Send.
Success! The app returned a random incorporated location in the US, its name and the state it belongs to, along with a ZIP code associated with the location. Send the request again, and another place will be generated in return.
Degrees of randomness
Now that we are all set and comfortable with the app, a few words about the Randomkey’s API family. The endpoint called in the first example is an instance of Randomkey’s Random Data API. Others include first name (female/male), last name, national insurance number, social security number, various date and numeric endpoints. The mechanism is the same for all supported methods: call the chosen RK’s endpoint and you will get a random data point within the requested class. Call the service for male first names in the US region and you will get a ‘Mark’ or a ‘Khaliyl’, either a single representative of the 60,000 names available for the endpoint. The numeric and the date endpoints require some more information, usually the range you’d like the point to fall within (a date example is covered in the next section).
The Test Data API is Random Data’s more elaborate sibling. You can think of the API as a data translator: you send it an input, and in return you get another value. Let’s take a US male name example again. In my case, sending ‘Daniel’ will yield ‘Ejay’ in the response. Should I hit Send again, the response will remain unchanged: RK guarantees you will always get the same value for the same input. This is to ensure that all your applications connecting to Randomkey receive a uniform response and maintain referential integrity between the data sets. What’s more, your authorization key will decide the translations: these are uniquely (and — yes! — randomly) generated for every registered user. Most likely, your ‘Daniel’ will bring a different name in response (there is a 1 in 60,000 chance that you will also see Ejay, though).
The choice between Random Data and Test Data API boils down to whether you need to keep the referential integrity of your masked data, or whether you’re happy with it being purely coincidental. The Random Data service will always return a different response, while the Test Data generator will produce a consistent output for the same input. Read on to discover some more examples of both APIs.
More random data
Some endpoints of the Random Data API can simply be called to return a value, but some will require user input. Specifically, the time and date, and numeric services. It makes sense: perhaps your application only ever accepts users born before 2002, and generating younger users will violate the checks set on the database. Similarly, years before 1900 are rarely used within the application and might break things.
A sample request to the date endpoint might carry the following data:
URL: https://random.api.randomkey.io/v1/location
HEADERS
auth: eb3337dc23f1fca33ceb90bfd7f2450d
Content-Type: application/jsonBODY
{
“min”: “10-Jul-2010”,
“max”: “10-Aug-2012”,
“format”: “%d-%b-%Y”,
“records”: 1
}
Note how the format provided has to match the input data and will decide the format of the returned date. The date format codes follow that of the Python specification. Here %d stands for day with a leading zero (e.g. the first day of the month is written as 01), %b is a 3-letter abbreviation denoting a month, and %Y is a 4-digit year. You are also asked to specify the delimiter between each element.
Achieving a realistic random
Just as with the date example, most likely you’d like your data random — but not too random to keep your test apps reasonably realistic. The randomness can be controlled by specifying the range or the region of the target value.
The personal information such as names, locations, and document numbers are tied to regional endpoints. Hence, if your users are mostly French or Polish you will not populate the testing with North American names that often lack the special characters known for alphabets all over the world.
Achieving referential integrity
Sometimes random is required within the data, but consistency is a must between the requests. In other words: we might want to anonymise our data set in a uniform way across the organisation. For instance, Austin, Texas should consistently translate to another, random location: like Minneapolis, Minnesota. Or a certain Social Security Number should always yield the same equivalent (as much as we don’t care what that equivalent is). This is useful specifically for applications that combine data from multiple sources and care for integrity checks.
RK’s Test Data generator has been built for that purpose. Every user generates their own data tables so that the randomness per account is assured and RK users don’t share their data sets.
As an example, let’s imagine you need to generate an alternative set of Social Security Numbers, since original numbers should never ever be used for testing. You need those SSNs to match the SSA specification and to pass the database check. The numbers also have to match across the tables to verify the integrity within the database.
Post a test SSN to the app to return its assigned random value:
URL: https://random.api.randomkey.io/v1/id/ssn
HEADERS
auth: eb3337dc23f1fca33ceb90bfd7f2450d
Content-Type: application/jsonBODY
{
“id”: “741965201”
}
Every time you hit Send, the same output will be generated.
The processed can be followed for any other Test Data API endpoint, be it names, locations, or dates. See for yourself how easy it is to generate the US location dataset:
- Send a POST request to https://test.api.randomkey.io/v1/location
Provide a sample location in the request body, your own address (RK hashes your data — it is never stored in the clear) or use a sample location, such as Louis Armstrong Park in New Orleans, Lousiana:
{
“city”: “New Orleans”,
“state”: “Louisiana”,
“zip”: “70116”,
“region”: ”us”
} - Receive the location from the API. In my case I got Long Lake, Minnesota:
Note how as soon as we misspell the city name as “New Orlens”, the output of the function changes:
Summary
Randomkey let’s you choose: keep your data fully random, or random to an extent: by manipulating the range of returned values, their region, or setting up consistency per request so that a certain data point always translates to a set equivalent. Use RK’s capabilities to your benefit by adjusting the extend of randomness.
We look forward to your feedback! Let us know in the comments how you found your experience with the app and what we can improve on. Randomkey is an indie bootstrapped project and welcomes all community engagement. Thank you for reading!