Skip to content

The JValue Project

Open data, easy and social

Use-cases
Hub + Cloud
Jayvee
Resources
- State of Open Data in the EU (2019 – 2023)
- Surprising Uses of Open Data
News
About
- Team
- History

Final Thesis: ETL Data Pipelines Configurations in Spark

Abstract: The JValue Open Data Service (ODS) is an ETL data pipeline that provides data extraction from different source systems (Extract), performs transformations on the extracted data (Transform), and loads the data to a target database (Load). There are different kinds of stream processing engines that cope with data that have high volume, variety, and velocity. Existing ETLs cannot be applied to different streaming services, and the use of various frameworks and programming languages brings complexity along. Among different streaming services, Apache Spark offers accelerated, reusable, and scalable ETLs. This thesis aims to suggest an approach to compile and configure a data pipeline and have it runnable on Apache Spark.

Keywords: ETL pipeline, stream processing

PDF: Bachelor Thesis

Reference: Gizem Batmaci. ETL Data Pipelines Configurations in Spark. Bachelor Thesis. Friedrich-Alexander-Universität Erlangen-Nürnberg: 2022.

Posted

2022-10-18

in

by

Agnes Low

Tags:

Engineering Thesis, JValue ODS, JValue Open Data Service

Legal Notices

GitHub
LinkedIn
Twitter
Mastodon

Privacy & Cookies: This site uses cookies. By continuing to use this website, you agree to their use. To find out more, including how to control cookies, see here: Cookie Policy.

365