Document Classification in Banking - A Retrieval-Augmented Generation Approach

Every day, organisations handle a flood of documents - some harmless, others containing their most valuable secrets. How can companies classify them effectively and efficiently to protect the information that is most sensitive?

Benzenhöfer, Katja, 2025

Art der Arbeit Bachelor Thesis

Auftraggebende A well-known financial Institution

Betreuende Dozierende Renold, Manuel

Despite the rapid advancements in technologies such as machine learning, deep learning and artificial intelligence, many companies still rely on manual assignment of sensitivity labels. This process is time-consuming and slows down the workflow while increasing operational costs. Moreover, inconsistencies, bias and a lack of knowledge can lead to human error, which is a major concern. Leveraging these technologies enables faster and more accurate classification while uncovering hidden patterns and relations between information.

This paper aims to provide understanding of document classification and its vital role in today’s business environments, focusing on retrieval-augmented generation (RAG). It is based on desk research and expert interviews, as well as technical documentation and programming handbooks. The work includes a theoretical overview of data classification and RAGs. A proof-of-concept demonstrates the RAG’s capability to categorise documents based on the sensitivity of their context.

The results cover a proof-of-concept for document classification using retrieval-augmented generation to automate the process of assigning sensitivity labels to documents. Moreover, an implementation plan is provided to facilitate the implementation of a classification tool based on an RAG system.

Studiengang: Business Information Technology (Bachelor)

Keywords Document Classification, Classification, RAG, Banking, Retrieval-Augmented Generation

Vertraulichkeit: vertraulich

Art der Arbeit

Bachelor Thesis

Auftraggebende

A well-known financial Institution, Basel

Autorinnen und Autoren

Benzenhöfer, Katja

Betreuende Dozierende

Renold, Manuel

Publikationsjahr

2025

Sprache der Arbeit

Englisch

Vertraulichkeit

vertraulich

Studiengang

Business Information Technology (Bachelor)

Standort Studiengang

Basel

Keywords

Document Classification, Classification, RAG, Banking, Retrieval-Augmented Generation