Table Extraction and Financial Analysis of Financial Statements of SMI Companies in PDF Format Using AI
This Master’s thesis investigates how financial statements of Swiss Market Index companies can be analysed more reliably using Large Language Models, with a particular focus on extracting and interpreting tabular data from PDF annual reports.
Luginbühl, Nicola, 2025
Type of Thesis Master Thesis
Client
Supervisor Hanne, Thomas
Views: 1 - Downloads: 0
A hybrid prototype was developed that combines automatic table extraction in Python with LLM-based financial analysis. Balance sheet and income statement tables from the annual reports of 20 SMI companies were extracted and serialised as CSV, XML and PKL files. Custom LLM configurations based on GPT-4o, GPT-5 and the open-source model Qwen3-VL-235B-A22B were evaluated on eight standardised questions about key financial metrics, using a three-level grading scheme.
The results show that modern LLMs already achieve very high accuracy when operating on copy-pasted table text. GPT-5 reached 98.8% overall accuracy, clearly outperforming GPT-4o (85.4%). Structured formats did not automatically outperform this baseline. With GPT-5, XML and CSV inputs typically achieved accuracies between 91–94%, while PKL performed significantly worse. Only one configuration, GPT-5 combined with XML, a financial ontology and a detailed meta prompt reached 100% accuracy across all questions and therefore was able to outperform the copy-pasted table data. Human-in-the-loop prompting, on its own, did not consistently reduce errors or increase transparency.Overall, the findings indicate that using ordered tabular data can improve extraction precision only under specific conditions, particularly when semantically rich formats such as XML are aligned with an appropriate ontology and carefully designed prompts. In most other cases, a well-configured LLM applied directly to copy-pasted financial tables already provides highly accurate metric extraction, suggesting that prompt design and model choice are at least as critical as the choice of intermediate file format.
Studyprogram: Business Information Systems (Master)
Keywords
Confidentiality: öffentlich