We Need to Talk About Test Data Privacy
Organisations are endangering their users’ identity by using production data in software development
Test data management tends to go under radar when data security initiatives are implemented. Test data is key in software development as it serves as an assurance that an application is working as intended. More often than not, production data — so real user’s information — is used for those purposes. It’s technically challenging to create a realistic, but fictional, representation of the original data. A fake data set has to display the same characteristics as the information it is based on: including a syntactic and semantic coherence, language, data point uniqueness, and referential integrity between the data sets. As there are few test data generators on the market and their price tag is high, in-house development becomes the go-to solution. Creating a custom test data management application requires development resources and has to be maintained like any other piece of production software. It’s complicated and costly, thus many organisations resort to the only alternative: ignoring the problem.
The strategy works well, until it doesn’t. Granting testers unlimited access to production data exposes sensitive information about the customers internally. Such information can be used to impersonate a person as it commonly stores their name, address, date of birth, social security number, credit card number. As a by-product, the ethical responsibility is shifted from the company’s board to contracted developers. It’s a made bed for data leaks. Breaches happen when no one is looking —and they are powered by negligence often enough to get worried. Verizon reports that internal actors make up for one third of all security incidents.
Software security has been put in the spotlight following a series of international laws centered on guaranteeing citizen’s privacy on the internet. GDPR is the most notable European initiative forcing all organisations to ensure security in processing EU citizen’s information. Data protection is to be built-in the software (privacy by design and by default). Organisations are asked to incorporate good data security practices and protect their customers’ identity throughout the whole software development life cycle: from data capture, data processing, to data monetisation (re-selling). Testing, often forgotten, is just another piece of that process.
Improving the system requires systemic and programmatic changes. Companies need put an end to the the laissez-faire test practices and take ownership of data privacy practices across the whole organisation. On the software front, new test data management tools have to emerge to help with the problem. Randomkey.io is one such project: it’s a REST API that produces test data sets. The data is realistic but fictional, regional where need be, and keeps the datasets’ referential integrity by respecting geographical hierarchies and always returning the same answer for each specific input value. Randomkey is now in its beta phase and will be made generally available in the first months of 2020.