This is a copy of the article I first published on Meilleurs Agents Engineering here : https://medium.com/Meilleurs%20Agents-engineering/rewrite-of-our-analytics-applications-using-kubernetes-8ddc86068822

During my 6 months internship at Meilleurs Agents in backend web development, my subject was to rewrite the application responsible for collecting analytics metrics on the website. The goal was to rewrite the application itself, part of the pipeline to store the data and use this opportunity to try new managed hosting solutions.

Introduction

The application, called WebAnalytics is used for collecting analytics events from visitors when they see or interact with realtor content on the website. The events themselves are very simple, with an identifier and a key/value list of data.

These events are then stored on our data warehouse on BigQuery for use by the BI team. Realtors can also see aggregated view statistics on their professional dashboard as shown below :

Statistics collected from WebAnalytics as displayed on the realtor dashboard

How it worked

The existing WebAnalytics application had multiple components :

The previous architecture
  • A backend with an API, written with Python and Flask, collecting the events sent by the website frontend.

  • A Postgres database, where the events were saved by the API backend until they were sent to the data warehouse with a daily CRON job.

  • An Amazon Redshift database, used as a data warehouse to store all events.

  • Various scripts (CRON jobs scripts, database maintenance scripts…) to synchronize the events between Postgres and Redshift, vacuum them…

Since the app was written, a lot of changes happened. For example, a data warehouse used by all apps was created on BigQuery. As WebAnalytics was created before it, it still used Redshift as storage. A script was added to push these events to BigQuery to group them with the rest of the data warehouse, but the Redshift database still remained in production.

Airflow also started to be used for running workflows by the Meilleurs Agents data team, instead of custom CRON tasks, to have a more centralized system.

The aim of the rewrite was to simplify this architecture : decommissioning the now unused Redshift database by pushing to BigQuery directly, using Airflow instead of CRON tasks for the synchronization jobs…

Choosing the technology for the rewrite

Containers — Photo by Guillaume Bolduc (https://unsplash.com/@guibolduc) on Unsplash (https://unsplash.com/s/photos/containers)

As the application is quite simple, and independent from other applications, this rewrite was also used as an opportunity to use new managed technologies to host applications in production. The selected solution was using Kubernetes and containers.

As we are using Google Cloud Platform for hosting, I also tried a few other technologies before choosing Kubernetes, like Google Cloud Functions. The idea was to use two functions communicating through Cloud Pub/Sub :

  • One exposing a public API to receive the events

  • Another one inserting the events in the database received through the Pub/Sub channel.

Although Cloud Functions provided a easy way to deploy the app and removed a lot of complexity, we didn’t choose this technology for this rewrite. This was mostly due to the unpredictable response times of the Cloud Functions, mostly because of their costly cold starts.

Then, I tried Google Kubernetes Engine, which met all our needs. Moreover, the Kubernetes cluster created for the rewrite could be used later on for other applications.

Most of Meilleurs Agents applications are deployed on Google Cloud Compute Instances using Ansible + Terraform through Jenkins jobs. Kubernetes was already used for some internal tools, but not for public applications. The aim was to see how we could deploy on Kubernetes instead, and how it could fit in the current architecture and deployment workflow.

Rewrite

First, the API itself was rewritten to bring it up to date compared to our other apps : using Celery for splitting the API and the database insertion code, cleaning up the database schema…

As explained above, the transfer of the events from the Postgres database to the data warehouse was done mostly through CRON jobs. As Airflow was used in other parts of the company, I converted the scripts to Airflow DAGs. Airflow provides various operators out-of-the-box to write these kinds of pipeline very easily. Moreover, Airflow provides better logging, a web dashboard… which makes it easier to monitor these jobs :

The Airflow Postgres to BigQuery DAG as shown on the web dashboard

The app was then containerized for use in Kubernetes on the staging and production environments. On the developers environments, the image is used with Docker Compose. It allows us to minimize the differences between the dev, staging and production environments by using the same base image everywhere.

Deployment pipeline

The Kubernetes provider of Terraform was used to automate the deployment of the application : the creation of the cluster, applying the Kubernetes Deployment configuration…

The Jenkins job used to deploy the internal tools on Kubernetes was then modified to be able to deploy this application too. The idea was to create the most generic solution possible, for us to be able to migrate more apps to Kubernetes in the future.

By using Kubernetes, we can delegate most of the common deployment issues (rolling updates, configuration changes, scaling up and down…) to Kubernetes itself, simplifying the configuration and the deployment pipeline.

Validation of the new solution

To minimize issues while deploying the new solution, we wanted to do a graduate rollout of it. Instead of replacing directly the old pipeline, we started by running the two systems in parallel as explained below :

The traffic duplication solution used to test the rewrite

This solution allowed us to validate the load on the new API and the number of events collected by the new system while minimizing the impact on the frontend.

Once the new solution was fully working, we were able to deploy it by removing the old version and changing the infrastructure configuration of the frontend apps to point to the new endpoint directly.

Conclusion

This 6 months internship was a great opportunity to work on a full rewrite of an application, with different aspects involved :

  • Learning how the existing system worked.

  • Realizing POCs using different technologies to select one to use.

  • Doing the rewrite itself and deploying it.

  • It gave me the opportunity to learn more about various technologies such as Kubernetes or Terraform.

The application is now fully deployed in production, and was used as a first step to validate the use of containers and Kubernetes for other applications 🚀.