Prompt and Data Optimisation for an HR System

Part of the Innosuisse-funded "Talent Track Pro" project, this thesis optimizes AI-powered education credential processing for recruitment, improving how candidate qualifications are identified from CVs. Key finding: simpler AI prompts outperform complex ones.

Emre Yelögrü, 2026

Art der Arbeit Bachelor Thesis

Auftraggebende FHNW University of Applied Sciences and Arts Northwestern Switzerland

Betreuende Dozierende Hanne, Thomas

Scrambl., a Swiss AI recruitment startup, faces challenges processing education credentials from candidate CVs. Their system must extract education entries from unstructured text and map them to a database of 21'000+ standardized programs. Initial analysis revealed errors including wrong degree levels, completely wrong matches, and issues stemming from poor extraction quality. The existing extraction system also used an outdated 13-category classification incompatible with Swiss credential types (EFZ, EBA, CAS, etc.).

The research proceeded in two phases. Phase 1 analyzed the mapping algorithm using 327 CV entries, identifying seven error categories and their frequencies. This revealed that extraction quality constrained mapping performance. Phase 2 pivoted to extraction optimization: a benchmark dataset of 60 CVs (742 education items) was created, the credential taxonomy was expanded from 25 to 29 Swiss-specific types, and multiple extraction approaches were compared, including one-step versus two-step extraction with various prompt variations.

The optimized one-step extraction achieved significant improvements: F1 score increased from 70.0% to 79.6%, with recall improving from 82.3% to 94.4%. This higher recall is particularly valuable for Scrambl., as missing education entries directly impacts candidate matching quality. A key counterintuitive finding emerged: simpler prompts outperformed detailed ones. The best-performing prompt simply listed what to extract and ignore, rather than encoding complex classification rules. Deliverables for Scrambl. include: Production-ready extraction prompt with demonstrated performance improvements Benchmark dataset of 742 labeled education items for ongoing quality assurance Expanded 29-type credential taxonomy aligned with Swiss education standards Systematic error analysis documenting mapping failure modes Concrete solution approaches for future mapping optimization These results enable Scrambl. to improve their CV processing pipeline immediately while providing a foundation for continued development.

Studiengang: Wirtschaftsinformatik (Bachelor)

Keywords Large Language Models, Prompt Engineering, Information Extraction, CV Processing, Education Credential Mapping, Benchmark Dataset, Entity Resolution

Vertraulichkeit: vertraulich

Art der Arbeit

Bachelor Thesis

Auftraggebende

FHNW University of Applied Sciences and Arts Northwestern Switzerland, Olten

Autorinnen und Autoren

Emre Yelögrü

Betreuende Dozierende

Hanne, Thomas

Publikationsjahr

2026

Sprache der Arbeit

Englisch

Vertraulichkeit

vertraulich

Studiengang

Wirtschaftsinformatik (Bachelor)

Standort Studiengang

Basel

Keywords

Large Language Models, Prompt Engineering, Information Extraction, CV Processing, Education Credential Mapping, Benchmark Dataset, Entity Resolution