End-to-End Table Extraction from Annual Reports using DL and NLP

Enhancing the retrieval of tabular data from PDFs using deep learning techniques and a natural language interface, with a particular focus on annual reports.

Mushkolaj, Rijon, 2024

Art der Arbeit Master Thesis
Auftraggebende
Betreuende Dozierende Hanne, Thomas
Views: 3 - Downloads: 0
Annual reports contain many important data and information – some of this data and information is included in tables. The extraction of these table data is associated with various challenges, including the unstructured nature of PDF documents and the wide variability of table representations. The aim of this master's thesis is to explore an innovative end-to-end solution that enables a user to interface with tabular data within annual reports in PDF format through natural language inputs. The thesis addresses two main challenges: the automated extraction of table data from unstructured PDF documents, and interfacing this data through user inputs in the form of natural language questioning – for example, allowing the user to ask a question about the table content in the annual report like: "What was the profit in 2023?". This aims to make the process of information retrieval easier and more efficient.
Through the evaluation of various possibilities, the thesis proposes a solution for an end-to-end process. This process incorporates new technologies based on Deep Learning (DL), Machine Learning (ML), and Natural Language Processing (NLP).
The research findings indicate that while the defined process shows significant potential, it requires further refinement and fine-tuning to achieve optimal performance.
Studiengang: Business Information Systems (Master)
Keywords
Vertraulichkeit: öffentlich
Art der Arbeit
Master Thesis
Autorinnen und Autoren
Mushkolaj, Rijon
Betreuende Dozierende
Hanne, Thomas
Publikationsjahr
2024
Sprache der Arbeit
Englisch
Vertraulichkeit
öffentlich
Studiengang
Business Information Systems (Master)
Standort Studiengang
Olten