LLM Penetration Testing from a Technical Perspective

Attack Opportunities, System Behavior, and Real-World Vulnerabilities in Productive AI Integrations

The productive use of Large Language Models (LLMs) in companies is increasing significantly. The spectrum ranges from publicly accessible chatbots on company websites to internal assistance systems that access knowledge databases, document repositories or proprietary data sources. From a technical point of view, this results in hybrid systems that combine classic software architectures with probabilistic language models.

The security analysis of such systems is fundamentally different from traditional penetration testing. The main difference lies in the character of the model itself: an LLM is a black box. The path from input to output is not deterministically traceable. The model calculates token probabilities based on its training and current context. Which parts of a prompt are weighted more heavily, how internal security instructions are prioritized, or how competing instructions are resolved is not transparently visible - especially with proprietary cloud models.

This lack of transparency has direct security consequences. While classic software relies on clearly defined control flows, conditions and reproducible logic, an LLM reacts context-sensitively and statistically. Two nearly identical inputs can produce different outputs. Security mechanisms that are based purely on textual instructions in the prompt therefore do not constitute robust technical isolation.

From Prompt to Response: Technical Reality

In production environments, user input is rarely passed to the model in isolation. Typically, a composite prompt is created that consists of system instructions, developer logic, user requests, and, if necessary, externally retrieved contextual information. For the model, this is ultimately a contiguous block of text. There is no real, technically enforced separation between "security policy" and "user input".

The model processes everything sequentially as text. This is exactly where the core of many attacks lies.

Prompt injection as a structural problem

Prompt injection is not a classic injection attack in the sense of SQL or command injection, but a manipulation of the textual decision-making basis of the model. Since security rules are often also formulated as text ("Do not answer confidential questions", "Do not disclose internal information"), an attacker can try to relativize or overwrite these instructions with new instructions.

The model has no inherent ability to clearly distinguish between legitimate policy and malicious instruction. It evaluates token probabilities. If a manipulated user instruction is semantically strong or cleverly contextualized, it can overlay the originally set security instructions.

Since the internal decision-making process is not transparent, the security check here is inevitably adversarial: one systematically tests the effect of competing instructions and whether protection mechanisms are actually robust or only appear to exist.

Internal LLMs and the Risk of Document Analysis

These questions are particularly relevant for internal LLM systems that work with retrieval-augmented generation (RAG). In such architectures, corporate documents -- such as PDFs, Office files, wiki content or CRM data -- are automatically analyzed, converted into text, vectorized and indexed in a database.

If a user request is made later, the system searches for semantically matching documents and adds their contents to the prompt as context. The model generates its response based on this.

Technically, this means that the content of PDF files is completely extracted and made available to the model as text. In the process, visual separations, formatting or semantic structure are often lost. Hidden text areas or metadata can also be indexed, as long as they are not explicitly filtered.

This is where a particularly critical attack scenario arises. If an attacker is able to inject manipulated content into an indexed document -- such as an internal wiki, a shared PDF, or a shared document repository -- they can place malicious instructions that are later processed by the LLM.

Since the model does not distinguish between "user prompt" and "document context", an instruction placed in the PDF can become part of the basis for decision-making. The actual attack surface is no longer the chat interface, but the company's document base.

A practical scenario arises in the recruiting process: Applications are often submitted via publicly accessible web forms. Curriculum vitae (CV) and cover letter are uploaded as PDFs, automatically stored and stored internally. If the HR department later accesses these documents via an internal LLM system -- for example, to summarize profiles or compare candidates -- the contents of these PDFs are read and processed by the system.

If an attacker now deliberately places malicious instructions in the CV or cover letter, these can enter the internal LLM via the document index. The attack path thus runs from a publicly accessible upload function to document management and an internal AI system. Technically, it is an indirect injection that does not take place via the interface of the LLM, but via an upstream business process.

Many organizations do not take this risk into account because they assume that documents are merely a source of information. In fact, however, they become an integral part of the prompt and thus a potential attack vector.

Access control and data exfiltration

Another structural problem of internal LLM systems lies in access control. The model itself has no permissions. It merely processes the context that is passed to it by the backend.

If the retrieval system retrieves documents without enforcing a clean document level access control, it can happen that content is passed to the model that the requesting user should not actually see. The model may then use this content in its response or at least indirectly reference it.

Attacks are often iterative. Through clever questioning, rephrasing or shifting context, information can be reconstructed step by step. Even if complete documents are not output, fragments or summaries can reveal sensitive content.
The core problem is architectural: security logic must not be left to the model, but must be technically enforced before context generation.

Tool integration and active system interventions

With the introduction of function calling or tool integrations, the risk profile is shifting further. The model no longer generates just text, but structured calls to backend functions. These can perform database operations, create tickets, or access external APIs. If such function calls are not strictly validated and authorized on the server side, an attacker can indirectly trigger actions via manipulated prompts. It becomes particularly critical when backend service accounts have far-reaching rights and the model acts as an intermediary.

In this context, it becomes clear that an LLM is not a security-conscious actor. It follows statistical patterns. Every safety-relevant decision must be technically secured outside the model.

Context and session isolation

Another technical aspect concerns the management of conversational contexts. Many systems store chat histories or use caching mechanisms to generate responses more efficiently.
If contextual data is not properly isolated, it can lead to blending between users or sessions. In multi-tenant environments. this represents a significant risk. Since the model itself does not have client separation, this must be strictly enforced in the backend.

Basic technical insight

LLM systems are not classic software components with clear decision logic, but probabilistic word processing systems. They have no intrinsic understanding of security and no reliable separation between instruction and data.
As soon as internal documents are automatically analyzed, indexed and integrated into prompts, the attack surface expands considerably. Manipulated PDF content, indirect prompt injection via knowledge bases, and inadequate access controls are among the most realistic risks in productive enterprise environments.

The security assessment of such systems therefore requires an architectural understanding of the entire processing chain – from document parsing to retrieval mechanisms and tool execution.

The critical point is not only in the model, but in the way it is connected to data, context and system functions.