Document Classification in Banking - A Retrieval-Augmented Generation Approach
Every day, organisations handle a flood of documents - some harmless, others containing their most valuable secrets. How can companies classify them effectively and efficiently to protect the information that is most sensitive?
Katja Benzenhöfer, 2025
Art der Arbeit Bachelor Thesis
Auftraggebende A well-known financial Institution
Betreuende Dozierende Renold, Manuel
Views: 5
Despite the rapid advancements in technologies such as machine learning, deep learning and artificial intelligence, many companies still rely on manual assignment of sensitivity labels. This process is time-consuming and slows down the workflow while increasing operational costs. Moreover, inconsistencies, bias and a lack of knowledge can lead to human error, which is a major concern. Leveraging these technologies enables faster and more accurate classification while uncovering hidden patterns and relations between information.
This paper aims to provide understanding of document classification and its vital role in today’s business environments, focusing on retrieval-augmented generation (RAG). It is based on desk research and expert interviews, as well as technical documentation and programming handbooks. The work includes a theoretical overview of data classification and RAGs. A proof-of-concept demonstrates the RAG’s capability to categorise documents based on the sensitivity of their context.
The results cover a proof-of-concept for document classification using retrieval-augmented generation to automate the process of assigning sensitivity labels to documents. Moreover, an implementation plan is provided to facilitate the implementation of a classification tool based on an RAG system.
Studiengang: Business Information Technology (Bachelor)
Keywords Document Classification, Classification, RAG, Banking, Retrieval-Augmented Generation
Vertraulichkeit: vertraulich