I don’t want to miss a Thing 🎶 - Track Database Changes with Apache Kafka

An application with a central database and a series of ETL (Extract, Transform, Load) flows to get data from there to the data warehouse is a familiar pattern in software architecture everywhere. It works very well but usually ETLs are single-purpose oriented, additional targets create dedicated flows which over time can turn into too much load and slow things down. A more performant alternative is to use Kafka Connect to pick up database changes and pass them to Apache Kafka. Once the data is in Kafka, can be reshaped and pushed to several downstream applications without creating additional load to the source system. This open source data streaming platform integrates with your existing setup and with a bit of configuration can replace too-much-of-a-good-thing ETL flows and bring simplicity and performance to your data pipeline. This session will show how Apache Kafka operates and how existing data platforms, like a PostgreSQL database, can be integrated with it both as data source and target. Several Kafka Connect options will be explored in order to understand benefits and limitations. The session is intended for everyone who wants to avoid the classic “Spaghetti architecture” and base their data pipeline on Apache Kafka, the main open source data streaming technology.

Big Data

Database

NDC { London }

I don’t want to miss a Thing 🎶 - Track Database Changes with Apache Kafka

Francesco Tisiot