Data anonymization : Bricoprivé case study

5 min

read

The challenge

BricoPrivé is an online home improvement retail business whose presence spans across multiple countries. As any online shop, they have to store names, email addresses, delivery addresses, contact information and other personal information to be able to ship their goods to their customers. The European law is rightfully strict when it comes to the protection of user information, but it requires organizations to tread carefully around this data, especially if it can be exposed to different teams.

Technofy is helping BricoPrivé's team on multiple aspects related to AWS and DevOps, but this use case focuses on the implementation of a data anonymization mechanism to ensure GDPR compliance when employees need to work with customer data.

Business pain & challenges

Lack of observability
Pattern analysis in logs and metrics
Troubleshooting
Lack of visibility into resource usage

‍

Tech Stack

‍

AWS

Step Functions: Used to define the anonymization workflow.
Lambda: Serverless code execution service which allows us to run various actions of the anonymization process.
ECS: Used to run the database anonymization script which can exceed the Lambda time execution limit.
RDS: Relational Database Service which hosts database instances.

‍

The solution

‍

Overview

‍

BricoPrivé already had an anonymization script that scrambles the data in-place. The main goal of our operation was to completely automate the anonymization process, and provide the developers with a fresh scrambled copy of the production database every week.

The production and development environments are separated in two different AWS accounts in the same organization. This practice has numerous advantages in terms of security, isolation of resources and cost allocation. This also means that the production and development databases are two completely separate clusters.

The solution consists of two parts:

The anonymizer, which runs on the production account.
The receiver, which runs on the development account.

The workflow of each side of the process is described in the sections below.

‍

Anonymizing the data

The script BricoPrivé has developed scrambles personal information "in-place" this means that the original data gets replaced by the scrambled one. In our case, thanks to the flexibility of the AWS ecosystem, we can spin up a database from the latest production snapshot, run the script, and create an anonymized snapshot in a few minutes.

Our approach does exactly that, and the whole thing is orchestrated thanks to AWS Step Functions.

‍

*Figure 2: Steps of the anonymization process on the production account*

‍

The workflow includes a few safety checks to make sure that the actions are happening correctly, and in the right order. Once the snapshot has been created and it is considered available, it is then shared to the development account.

All the steps are using a Lambda Function, except the anonymization step, which uses a container running on ECS, because the script can exceed the Lambda execution time limit.

This whole process happens in the production account to ensure that no personal information ever leaves the more restricted environment. As development environments are generally more lax in terms of access, it was decided that it should only receive anonymized data.

‍

Receiving anonymized data

The other part of the stack resides in the development account. It was decided to use a CloudWatch Event rule with a schedule to trigger the other Step Function rather than an event-driven approach because we could potentially want to restore the development database multiple times during the week to the latest snapshot available.

‍

*Figure 3: Steps of the receiver process on the development account*

‍

The first step of this workflow is to copy the snapshot received on the current account to no longer depend on the anonymizer. Afterwards, we need to check if an anonymized database is already up, to know if we will need to delete it before deploying the new one with the last snapshot received during the day.

All the steps are using Lambda functions.

Results & Highlights

This solution an approach let us lot of advantages:

The process is cost efficient thanks to the various serverless services
Easily duplicate to another database
Can be triggered manually or automatically
Short execution time
Sensitive data stay in production account

‍

Thank you for reading this article. We hope you enjoyed it!
‍
Contact us for more information about our accompaniment and expertise !