Abstract: Nowadays the internet provides a lot of open data for public use. Those can be written in various data types and cover plenty of subjects. Because of that the absence of a standard results into the main problem. Every provider can decide for himself how the data is constructed.
The JValue project is dedicated to this problem and aims to be the central point where those open data are gathered and optimized. Currently the JValue Open- Data-Service (ODS) provides the extraction, transformation and retrieving of open data supporting numerous protocols and data formats.
However until now there is only a very generic interface for the retrieval of those open data since the system currently ignores any data structure. In addition to that any provider can alter their data structure and upload it after the adjustment process, since they are not bound to any restrictions. This can lead to major restrictions or even the loss of the data gathering process.
To counteract this behavior a process shall be introduced, which allows the ODS to structure those open data. Furthermore a schema recommendation for the data should be generated, which then will be the foundation of the remaining data gathering process.
As a consequence of the introduced data schema there is now a possibility to also derive fitting database tables from those schema. This tables should be created and filled dynamically and provide the user a fully and easy accessible interface. As an implication of the persistent structured data, the earlier mentioned problem of frequently changing data structures can now be easily solved. The schema can be used to validate those imported and transformed data. By also adding a corresponding visual state to those data configurations, the user will be able to react up on changed data structures.
Keywords: data engineering, schema recommendation, open data
PDF: Master Thesis
Reference: Alexander Mahler. Giving Structure to Open Data in the JValue ODS. Master Thesis. Friedrich-Alexander-Universität Erlangen-Nürnberg: 2021.