Document Classification in Banking - A Retrieval-Augmented Generation Approach

Every day, organisations handle a flood of documents - some harmless, others containing their most valuable secrets. How can companies classify them effectively and efficiently to protect the information that is most sensitive?

Katja Benzenhöfer, 2025

Art der Arbeit Bachelor Thesis
Auftraggebende A well-known financial Institution
Betreuende Dozierende Renold, Manuel
Views: 5
Despite the rapid advancements in technologies such as machine learning, deep learning and artificial intelligence, many companies still rely on manual assignment of sensitivity labels. This process is time-consuming and slows down the workflow while increasing operational costs. Moreover, inconsistencies, bias and a lack of knowledge can lead to human error, which is a major concern. Leveraging these technologies enables faster and more accurate classification while uncovering hidden patterns and relations between information.
This paper aims to provide understanding of document classification and its vital role in today’s business environments, focusing on retrieval-augmented generation (RAG). It is based on desk research and expert interviews, as well as technical documentation and programming handbooks. The work includes a theoretical overview of data classification and RAGs. A proof-of-concept demonstrates the RAG’s capability to categorise documents based on the sensitivity of their context.
The results cover a proof-of-concept for document classification using retrieval-augmented generation to automate the process of assigning sensitivity labels to documents. Moreover, an implementation plan is provided to facilitate the implementation of a classification tool based on an RAG system.
Studiengang: Business Information Technology (Bachelor)
Keywords Document Classification, Classification, RAG, Banking, Retrieval-Augmented Generation
Vertraulichkeit: vertraulich
Art der Arbeit
Bachelor Thesis
Auftraggebende
A well-known financial Institution, Basel
Autorinnen und Autoren
Katja Benzenhöfer
Betreuende Dozierende
Renold, Manuel
Publikationsjahr
2025
Sprache der Arbeit
Englisch
Vertraulichkeit
vertraulich
Studiengang
Business Information Technology (Bachelor)
Standort Studiengang
Basel
Keywords
Document Classification, Classification, RAG, Banking, Retrieval-Augmented Generation