Short description:

Within the scope of the project, data or information from several web services is to be made accessible to the customer via a REST interface. PTA is developing a prototype for this application scenario using the open source tool and framework Apache Airflow.


The customer would like to integrate information or data from various web services into the company's own merchandise management system based on IBM iSeries (AS/400). With Apache Airflow it is possible to connect different database systems (SQL, NoSQL, GraphQL etc.) and to process data in terms of (E)xtract-(T)ransform-(L)oad (ETL). PTA is evaluating Apache Airflow as a tool for implementing workflows or ETL routes to transfer data from web services to the target system. The workflows in Apache Airflow are completely described in the script language Python. Using the example of a prototype, PTA investigates the suitability of the tool with regard to the customer's requirements. The focus is not only on the design of the workflows or ETL processes, but also on aspects such as user administration, scheduling and monitoring of the processes.

Technical description:

Apache Airflow was developed by Airbnb and is used by many large IT companies (Facebook, Yahoo, Intel etc.) in everyday life as a workflow management system or ETL tool. Apache Airflow is often used in practice for data integration in the areas of business intelligence (BI) and data science (machine learning). It is easy to create (Instantiate), execute (Scheduling) and monitor (Monitoring) processes or workflows via an attractive, web-based user interface. Workflows are represented in Apache Airflow as directed graphs (DAGs). A node in the graph corresponds to a task and the edges between individual nodes represent the dependencies between them. Apache Airflow thrives on a large community, which extends the framework with specially developed plug-ins.