Final Thesis: Implementing an Open Data ETL Processing Engine with Kafka

Abstract: The JValue project group is developing a modeling ecosystem for Extract Transform Load (ETL) processes. Part of this ecosystem is a description model for those. This thesis suggests a conversion process from the description model into an Apache Kafka runtime, described in a cloud-native format, like Docker Compose. The conversion is implemented as a library and done in a multi-phase approach as known from classical compilers. In the first step, the description language is converted into a runtime independent intermediate description and afterward in a description of a concrete runtime, in this case, Kafka. The multi-phase approach minimizes the implementation work for additional runtimes and allows runtime independent optimization and analysis. The goal for the generated runtime is to use existing Kafka components, which is only partially possible due to the complexity of the description model.

Keywords: open data, compiler, Apache Kafka

PDF: Master Thesis

Reference: Fabian Arnold. Implementing an Open Data ETL Processing Engine with Kafka. Master Thesis. Friedrich-Alexander-Universität Erlangen-Nürnberg: 2022.