Real-time monitoring at BricoPrivé

Solution

High-level overview of streams between AWS and DataDog

Overview

The solution focuses on providing monitoring on three different levels:

On the CDN, thanks to real-time monitoring on CloudFront to capture all the requests coming from the internet.
From the instance level, metrics are reported by the DataDog agent.
From the application level, the solution uses syslog-ng to filter and dispatch the logs to the DataDog syslog ingestion endpoint.

All these different mediums of logs & metrics delivery are configured to use encryption in transit to comply with the security requirements.

Monitoring CloudFront

CloudFront is BricoPrivé's CDN of choice as multiple hundreds of terabytes are delivered each year to their users. The flexibility of the service allows for routing HTTP requests to different backends thanks to various parameters. This makes CloudFront akin to a more classic reverse-proxy which proves itself very valuable for businesses who wish to incrementally split their monolithic applications into smaller, more agile microservices.

CloudFront provides two ways of delivering access logs, standard and real-time. The standard logs are periodically delivered to S3 which allows for other systems to process them. The time-to-delivery of these logs can range from a few minutes to up to 24 hours. On the other hand, real-time logs are delivered in a few seconds to Kinesis Data Streams which in turn allows Kinesis Firehose to dispatch these to various backends.

BricoPrivé requires the latter as it allows them to have a better understanding of the scale at which they're operating. These real-time logs provide valuable information in terms of performance and customer experience which in turn can be analyzed and turned into optimization actions.

As mentioned above, Technofy has deployed Kinesis Data Streams and Kinesis Firehose to dispatch the logs into DataDog. This procedure is well described in the documentation (See Send AWS services logs with the Datadog Kinesis Firehose Destination).

Monitoring instances

BricoPrivé uses EC2 compute instances for their application and they follow the best practices regarding auto scaling, load balancing, and availability. Because of their growing activity, their computing costs are also increasing. Having an eye on the resource usage of each instance allows them to fine tune their auto scaling policies to maximize the usage of the resources without impacting the end user experience and have a proper cost control during scale out.

This monitoring is done thanks to the DataDog agent which reports system metrics and running processes in real-time. The agent can also stream log files, but in this case, the applicative requirements do not specify to use this feature.

Monitoring applications

The endeavour of providing observability in the different layers wouldn't be complete without peeking into the application layer. A widely known technology such as Syslog allows multiple log producers to use a standard protocol which reduces the operational complexity. In this case, we have settled on syslog-ng as the configuration is more palatable and understandable to the engineering teams.

Once again, DataDog provides a Syslog ingestion service. Two endpoints are available, one with TLS and the other without. Given that applicative logs could contain sensitive data, the natural choice was to configure syslog-ng to use the encrypted endpoint.

Deploying the agents

As described in the section "Monitoring instances", the DataDog agent provides a lot of insights on the system it is running on. We have also covered the fact that we also need to deploy syslog-ng on the instances.

At Technofy, we commonly use Ansible for configuration management on the systems of our customers. Luckily for us, DataDog already provides a role on Ansible Galaxy which makes the setup even easier. All we have to do is fill in a few configuration details and the API key.

In a lot of cases, Ansible deployments are made remotely via SSH, but BricoPrivé is heavily using AWS System Manager and uses it to apply patches and create remote sessions on their fleet of machines. The service also provides a way to run Ansible Playbooks through one of their managed SSM documents, namely "AWS-ApplyAnsiblePlaybooks". This document allows us to specify variables as well as an S3 bucket (or a GitHub repository) where it will look for the playbook. Once executed, the document then takes care of automatically installing Ansible on the target machines if it is not already present.

Apply an Ansible Playbook with AWS Systems Manager

Real-time monitoring at BricoPrivé

Success story

Use case overview

Main challenges

Tech stack

Solution

Results

Business pain & challenges

Tech stack

AWS

Technologies

Solution

Overview

Monitoring CloudFront

Monitoring instances

Monitoring applications

Deploying the agents

Results & highlights

Let's talk

Technofy Ltd.