Prompt and Data Optimisation for an HR System

Part of the Innosuisse-funded "Talent Track Pro" project, this thesis optimizes AI-powered education credential processing for recruitment, improving how candidate qualifications are identified from CVs. Key finding: simpler AI prompts outperform complex ones.

Emre Yelögrü, 2026

Art der Arbeit Bachelor Thesis
Auftraggebende FHNW University of Applied Sciences and Arts Northwestern Switzerland
Betreuende Dozierende Hanne, Thomas
Views: 4
Scrambl., a Swiss AI recruitment startup, faces challenges processing education credentials from candidate CVs. Their system must extract education entries from unstructured text and map them to a database of 21'000+ standardized programs. Initial analysis revealed errors including wrong degree levels, completely wrong matches, and issues stemming from poor extraction quality. The existing extraction system also used an outdated 13-category classification incompatible with Swiss credential types (EFZ, EBA, CAS, etc.).
The research proceeded in two phases. Phase 1 analyzed the mapping algorithm using 327 CV entries, identifying seven error categories and their frequencies. This revealed that extraction quality constrained mapping performance. Phase 2 pivoted to extraction optimization: a benchmark dataset of 60 CVs (742 education items) was created, the credential taxonomy was expanded from 25 to 29 Swiss-specific types, and multiple extraction approaches were compared, including one-step versus two-step extraction with various prompt variations.
The optimized one-step extraction achieved significant improvements: F1 score increased from 70.0% to 79.6%, with recall improving from 82.3% to 94.4%. This higher recall is particularly valuable for Scrambl., as missing education entries directly impacts candidate matching quality. A key counterintuitive finding emerged: simpler prompts outperformed detailed ones. The best-performing prompt simply listed what to extract and ignore, rather than encoding complex classification rules. Deliverables for Scrambl. include: Production-ready extraction prompt with demonstrated performance improvements Benchmark dataset of 742 labeled education items for ongoing quality assurance Expanded 29-type credential taxonomy aligned with Swiss education standards Systematic error analysis documenting mapping failure modes Concrete solution approaches for future mapping optimization These results enable Scrambl. to improve their CV processing pipeline immediately while providing a foundation for continued development.
Studiengang: Wirtschaftsinformatik (Bachelor)
Keywords Large Language Models, Prompt Engineering, Information Extraction, CV Processing, Education Credential Mapping, Benchmark Dataset, Entity Resolution
Vertraulichkeit: vertraulich
Art der Arbeit
Bachelor Thesis
Auftraggebende
FHNW University of Applied Sciences and Arts Northwestern Switzerland, Olten
Autorinnen und Autoren
Emre Yelögrü
Betreuende Dozierende
Hanne, Thomas
Publikationsjahr
2026
Sprache der Arbeit
Englisch
Vertraulichkeit
vertraulich
Studiengang
Wirtschaftsinformatik (Bachelor)
Standort Studiengang
Basel
Keywords
Large Language Models, Prompt Engineering, Information Extraction, CV Processing, Education Credential Mapping, Benchmark Dataset, Entity Resolution