The EU publishes large amounts of open data on its own open data portal, https://data.europa.eu. They re-use their own open data to allow citizens to gain insights into the EU elections in 2024 (see, for example, these visualizations). Based on published open data, citizens can run their own analysis. Sadly, depending on the data set, the data is not easy to use. In Germany, the Bundeswahlleiterin publishes result data for the EU elections on their website: https://www.bundeswahlleiterin.de/europawahlen/2024/ergebnisse/opendata.html as © Die Bundeswahlleiterin, Wiesbaden 2024 under the license Datenlizenz Deutschland – Namensnennung – Version 2.0 (https://www.govdata.de/dl-de/by-2-0). The data set contains absolute votes by party and split into areas in Germany in a CSV.
With our domain-specific language Jayvee, we can easily model a data pipeline that allows us to extract, clean, and transform data. Finally, we save it in a SQLite database for further use in a format that makes it easy to run follow-up analysis. We use Jayvee version 0.5.0 for the upcoming example. You can find installation instructions here. You can see all the original source code described here in the example repository on GitHub: https://github.com/jvalue/jayvee-example-eu-election-2024.
Which Regions in Bavaria Saw the Largest Changes in Results by Party?
Given that JValue is being developed by the Professorship for Open-Source Software at the Friedrich-Alexander University Erlangen-Nürnberg, we are interested mainly in how voting behavior has changed by region in Bavaria.
First, we are interested in the differences between results in 2019 and 2024, so we define a transform in Jayvee to calculate the absolute vote difference:
publish transform VoterDiff {
from party2024 oftype integer;
from party2019 oftype integer;
to partyDiff oftype integer;
partyDiff: party2024 - party2019;
}
We can use this transformation to add a new column to the dataset for the difference by party, here as an example for the greens:
block GrueneDiff oftype TableTransformer {
inputColumns: [
"Grüne 2024",
"Grüne 2019"
];
outputColumn: "Grüne Diff";
uses: VoterDiff;
}
Similarly, we can define transforms for relative differences to 2019 and add that data to the dataset as well. To only keep Bavarian areas in the data set, we define a custom value type for Bavarian community identification numbers (CINs) that identify areas in Germany and always start with 09 if they are in Bavaria:
constraint BavarianCIN on text: value matches /^09[0-9]{3}$/;
publish valuetype BavarianCommunityIdentificationNumber oftype text {
constraints: [
BavarianCIN
];
}
We can then use the newly defined value type to mark only Bavarian CINs as valid data, thus filtering out any rows that are from areas not in Bavaria when parsing the CSV data as a table.
block ParseAsTable oftype TableInterpreter {
header: true;
columns: [
"Id" oftype BavarianCommunityIdentificationNumber,
// other columns
];
}
After executing the Jayvee model using the jayvee-interpreter, the console lets us know that we have successfully downloaded and transformed the data.
Instead of a CSV file with the absolute voting results for every region in Germany, we’ve created a focused data set in a structured format (a SQLite database) with additional information in the form of absolute and relative vote differences.
With the data cleaned, filtered, and transformed in a way to enable easy analytics, we can now use a simple Python script to visualize the areas with the largest and smallest differences in voting behavior for each party in the EU elections.
You can see all the original source code described here in the example repository on GitHub: https://github.com/jvalue/jayvee-example-eu-election-2024.