Prompt and Data Optimisation for an HR System
Part of the Innosuisse-funded "Talent Track Pro" project, this thesis optimizes AI-powered education credential processing for recruitment, improving how candidate qualifications are identified from CVs. Key finding: simpler AI prompts outperform complex ones.
Emre Yelögrü, 2026
Art der Arbeit Bachelor Thesis
Auftraggebende FHNW University of Applied Sciences and Arts Northwestern Switzerland
Betreuende Dozierende Hanne, Thomas
Views: 4
Scrambl., a Swiss AI recruitment startup, faces challenges processing education credentials from candidate CVs. Their system must extract education entries from unstructured text and map them to a database of 21'000+ standardized programs. Initial analysis revealed errors including wrong degree levels, completely wrong matches, and issues stemming from poor extraction quality. The existing extraction system also used an outdated 13-category classification incompatible with Swiss credential types (EFZ, EBA, CAS, etc.).
The research proceeded in two phases. Phase 1 analyzed the mapping algorithm using 327 CV entries, identifying seven error categories and their frequencies. This revealed that extraction quality constrained mapping performance. Phase 2 pivoted to extraction optimization: a benchmark dataset of 60 CVs (742 education items) was created, the credential taxonomy was expanded from 25 to 29 Swiss-specific types, and multiple extraction approaches were compared, including one-step versus two-step extraction with various prompt variations.
The optimized one-step extraction achieved significant improvements: F1 score increased from 70.0% to 79.6%, with recall improving from 82.3% to 94.4%. This higher recall is particularly valuable for Scrambl., as missing education entries directly impacts candidate matching quality.
A key counterintuitive finding emerged: simpler prompts outperformed detailed ones. The best-performing prompt simply listed what to extract and ignore, rather than encoding complex classification rules.
Deliverables for Scrambl. include:
Production-ready extraction prompt with demonstrated performance improvements
Benchmark dataset of 742 labeled education items for ongoing quality assurance
Expanded 29-type credential taxonomy aligned with Swiss education standards
Systematic error analysis documenting mapping failure modes
Concrete solution approaches for future mapping optimization
These results enable Scrambl. to improve their CV processing pipeline immediately while providing a foundation for continued development.
Studiengang: Wirtschaftsinformatik (Bachelor)
Keywords Large Language Models, Prompt Engineering, Information Extraction, CV Processing, Education Credential Mapping, Benchmark Dataset, Entity Resolution
Vertraulichkeit: vertraulich