This is one of 4546 IT projects that we have successfully completed with our customers.

How can we support you?

Extraction of key value pairs from invoices

Project duration: 8 months

Brief description

Invoices contain relevant information for every company. The manual extraction of this data from the documents is time-consuming and should be automated. AI models are used for this purpose.


The HuggingFace library is used. This is a Python library for computational linguistics. The pre-trained models are executed in Google Colab. In particular, the model LayoutXLM is used. The task also includes setting up a pipeline that reads in the data from the documents, preprocesses it, passes it on to the ML model and postprocesses it. In addition, the model must be trained for the specific task using data sets.

Subject description

The model is to extract data from documents such as invoices. Since the documents from different customers have a heterogeneous layout, an AI model must be used for pattern recognition. So-called transformers are used for this. These are machine learning models that roughly consist of two blocks, an encoder whose task is to understand the read text and a decoder that generates new text based on input data. The specific task of data extraction here requires only one encoder.


Project periodProjektzeitraum23.06.2023 - 08.02.2024

Have we sparked your interest?

Ole Knudsen

Key Account Manager

Jetzt Kontakt aufnehmen

Zum Umgang mit den hier erhobenen Daten informieren wir in unserer Datenschutzerklärung.

Contact now

We provide information on the handling of the data collected here in our privacy policy.

Download file

We provide information on the handling of the data collected here in our privacy policy.