Detecting Hidden Backdoors in Large Language Models

This thesis evaluates potential hidden backdoors in locally deployed Large Language Models (LLMs), focusing on DeepSeek-R1. Using behavioural, network and anomaly analysis, it examines the integrity, privacy and security of the model, offering insights for safer AI deployment.

Jibin Mathew Peechatt, 2025

Type of Thesis Bachelor Thesis
Client FHNW Institute for Information Systems (IWI)
Supervisor Christen, Patrik
Views: 10
The adoption of LLMs in critical systems raises concerns regarding privacy, security and trust, particularly with regard to the risk of hidden backdoors — malicious triggers that cause covert data transfer or altered behaviour. Such threats are difficult to detect, especially when models are black-box systems with concealed triggers. This thesis addresses this issue by empirically evaluating the DeepSeek-R1 model family for potential backdoors during local execution. The aim is to identify anomalies and recommend safer deployment practices.
The DeepSeek-R1 distilled models were tested locally using multiple techniques, including behavioural analysis, TCP inspection via PowerShell, Wireshark packet analysis, containerised isolation experiments, weight and activation tracking, and output anomaly detection using known trigger phrases. Access to the models was via Ollama and Hugging Face. Tests included politically sensitive prompts to evaluate potential censorship or instability. The study also evaluated the use of the '--network none' container isolation option as a safer deployment method.
During local use, no evidence of unauthorised external communication or backdoor activation was observed in the distilled DeepSeek-R1 model. Network traffic remained inactive beyond baseline processes and the model's weights, activation flows, perplexity metrics and embedding visualisations exhibited stable and expected behaviour. However, some prompt anomalies were detected, such as unexpected language switching in politically sensitive queries. Nevertheless, these anomalies did not suggest any malicious intent. The limited scope of the study, including a short observation period, a small prompt set, and a small number of model variants, means that the results are inconclusive. Nevertheless, the findings demonstrate that the models functioned securely under the tested conditions. The thesis recommends running LLMs in network-disabled Docker containers (--network none) to mitigate exfiltration risks, as well as developing legal, ethical, and technical frameworks for their secure use. This would provide clients with a repeatable testing framework with which to assess models prior to deployment, thereby reducing risks and increasing trust in AI systems.
Studyprogram: Business Information Technology (Bachelor)
Keywords Large Language Models (LLMs) , Backdoor Detection, Empirical Evaluation, Network Traffic Analysis, DeepSeek-R1
Confidentiality: öffentlich
Type of Thesis
Bachelor Thesis
Client
FHNW Institute for Information Systems (IWI), Basel
Authors
Jibin Mathew Peechatt
Supervisor
Christen, Patrik
Publication Year
2025
Thesis Language
English
Confidentiality
Public
Studyprogram
Business Information Technology (Bachelor)
Location
Brugg-Windisch
Keywords
Large Language Models (LLMs) , Backdoor Detection, Empirical Evaluation, Network Traffic Analysis, DeepSeek-R1