High Severity OWASP A05:2021 CWE-611

XXE: How AI Code Lets Attackers Read Your Server Files

Quick Answer: XXE (XML External Entity) attacks exploit XML parsers to read server files, access internal services, or crash your application. AI tools generate XML parsing code with dangerous defaults enabled. The fix is simple: disable DTD processing and external entities in your XML parser configuration.

XXE Attack Capabilities

File Disclosure Read /etc/passwd, config files, credentials

SSRF Access internal services, AWS metadata

DoS Billion Laughs attack exhausts memory

RCE (rare) Code execution via PHP expect://

What is XXE?

XXE (XML External Entity) is a vulnerability in XML parsers that process external entity references. When you parse XML, the parser can be instructed to fetch and include content from external files or URLs. Attackers exploit this to read sensitive files like /etc/passwd, AWS credentials, database configs - anything the server can access.

According to OWASP, XXE was significant enough to have its own category in the 2017 Top 10 (A4:2017). In 2021, it was merged into Security Misconfiguration (A05:2021) because the root cause is parser misconfiguration.

For vibe coders, the risk is real: AI tools often generate XML parsing code using default configurations that allow external entities. Unless you explicitly disable these features, your code is vulnerable.

Related Attack: XXE can enable SSRF attacks by fetching internal URLs. If your parser allows http:// entities, attackers can access http://169.254.169.254/ (AWS metadata) or internal services.

How XXE Attacks Work

XML allows defining "entities" in the Document Type Definition (DTD). An entity is like a variable that gets replaced when the XML is processed. External entities reference files or URLs:

Basic File Disclosure Attack

XML - Malicious Payload

<?xml version="1.0"?>
<!DOCTYPE foo [
  <!ENTITY xxe SYSTEM "file:///etc/passwd">
]>
<user>&xxe;</user>

When the parser processes &xxe;, it reads /etc/passwd and substitutes its content. If the application returns the parsed XML or includes it in a response, the attacker sees your server's user list.

SSRF via XXE

XML - SSRF Attack

<?xml version="1.0"?>
<!DOCTYPE foo [
  <!ENTITY xxe SYSTEM "http://169.254.169.254/latest/meta-data/iam/security-credentials/">
]>
<data>&xxe;</data>

This fetches AWS instance credentials from the metadata service - the same attack vector as direct SSRF.

Billion Laughs (DoS)

XML - DoS Attack

<?xml version="1.0"?>
<!DOCTYPE lolz [
  <!ENTITY lol "lol">
  <!ENTITY lol2 "&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;">
  <!ENTITY lol3 "&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;">
  <!ENTITY lol4 "&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;">
  <!-- Continues to lol9... -->
]>
<lolz>&lol9;</lolz>

The nested entity expansion creates exponential growth - a tiny XML file expands to gigabytes, exhausting server memory. This doesn't require external access, just entity expansion.

Why AI Generates Vulnerable Code

AI tools like Cursor and Claude Code generate XML parsing code using default configurations. These defaults prioritize functionality over security, leaving external entities enabled.

Java - Vulnerable by Default

Java - VULNERABLE (AI Default)

// What AI typically generates - XXE enabled by default!
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse(userInput);

Python - Partially Vulnerable

Python - VULNERABLE to Billion Laughs

# AI's default - vulnerable to DoS, partially safe from external entities
import xml.etree.ElementTree as ET
tree = ET.parse(user_file)  # Billion Laughs can crash this

Node.js - Configuration-Dependent

JavaScript - POTENTIALLY VULNERABLE

// Some Node.js parsers allow external entities
const xml2js = require('xml2js')
const parser = new xml2js.Parser()
parser.parseString(userInput, (err, result) => {
  // Depends on underlying libxml2 version
})

Why This Happens

AI learns from older tutorials without security hardening
Default configurations prioritize compatibility
XXE protection requires explicit disabling of features
Training data includes vulnerable Stack Overflow answers

Language-Specific Fixes

The fix is always the same concept: disable DTD processing and external entities. The syntax varies by language and parser.

Java

Java is the most dangerous because DocumentBuilderFactory is vulnerable by default.

Java - SECURE Configuration

// Disable DTD and external entities completely
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();

// Disable DTDs entirely
dbf.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true);

// Disable external entities
dbf.setFeature("http://xml.org/sax/features/external-general-entities", false);
dbf.setFeature("http://xml.org/sax/features/external-parameter-entities", false);

// Disable XInclude
dbf.setXIncludeAware(false);

// Don't expand entity references
dbf.setExpandEntityReferences(false);

DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse(userInput);

Python

Use defusedxml as a drop-in replacement for the standard library.

Python - SECURE with defusedxml

# pip install defusedxml
from defusedxml import ElementTree as ET

# Safe from XXE and Billion Laughs
tree = ET.parse(user_file)
root = tree.getroot()

# Also available:
from defusedxml import minidom
from defusedxml import sax

Node.js

JavaScript - SECURE Configuration

// Using libxmljs2 with safe options
const libxmljs = require('libxmljs2')

const doc = libxmljs.parseXml(xmlString, {
  noent: false,     // Don't expand entities
  dtdload: false,   // Don't load external DTD
  dtdvalid: false,  // Don't validate against DTD
  nonet: true       // No network access
})

// Or better: just use JSON if possible

PHP (8.0+)

PHP - SAFE by Default (8.0+)

// PHP 8.0+ disables external entities by default
$xml = simplexml_load_string($userInput);

// For older PHP versions:
libxml_set_external_entity_loader(function () {
    return null;  // Prevent external entity loading
});

.NET

C# - SECURE Configuration

// .NET 4.5.2+ XmlReader is safe by default
// For older versions or XmlDocument:
XmlReaderSettings settings = new XmlReaderSettings();
settings.DtdProcessing = DtdProcessing.Prohibit;
settings.XmlResolver = null;

using (XmlReader reader = XmlReader.Create(stream, settings))
{
    // Safe XML processing
}

Key Message: The configuration varies, but the goal is always the same - disable DTD processing and external entity resolution. When in doubt, consult the OWASP XXE Prevention Cheat Sheet for your specific parser.

When to Use JSON Instead

The simplest fix for XXE? Don't use XML. JSON has no entity mechanism, making XXE attacks impossible.

When JSON is Better

REST APIs: Modern APIs almost universally use JSON
Configuration files: JSON, YAML, or TOML are safer alternatives
Data interchange: JSON is simpler and widely supported
Frontend-backend communication: JavaScript native JSON support

When XML is Still Required

SOAP web services: Legacy enterprise systems
SVG files: Vector graphics are XML-based
Office documents: DOCX, XLSX are XML archives
RSS/Atom feeds: Syndication formats
Legacy integrations: Systems that only speak XML

If you're building a new vibe coded API, use JSON. Only reach for XML when you have a specific requirement that demands it - and when you do, disable DTD processing immediately.

AI Fix Prompt: Audit for XXE

Copy this prompt into Cursor, Claude Code, or any AI assistant:

AI Security Audit Prompt

Review my codebase for XML External Entity (XXE) vulnerabilities (CWE-611):

## Check 1: XML Parser Usage
Search for XML parsing patterns:
- Java: DocumentBuilderFactory, SAXParserFactory, XMLReader, XMLInputFactory
- Python: xml.etree, xml.sax, lxml, xml.dom
- Node.js: xml2js, libxmljs, fast-xml-parser, xml-parser
- PHP: simplexml_load_string, simplexml_load_file, DOMDocument
- .NET: XmlDocument, XmlReader, XDocument

## Check 2: DTD/External Entity Configuration
For each parser found, verify:
- Is DTD processing disabled?
- Are external general entities disabled?
- Are external parameter entities disabled?
- Is entity expansion limited (Billion Laughs protection)?

Flag: Parsers created without explicit security configuration

## Check 3: User Input to XML
Trace data flow:
- Does user input reach XML parsing?
- Are uploaded files (XML, SVG, DOCX, XLSX) parsed?
- Are SOAP/XML-RPC endpoints present?
- Does any external data get parsed as XML?

## Secure Configurations Required

Java - Add these features:
- setFeature("http://apache.org/xml/features/disallow-doctype-decl", true)
- setFeature("http://xml.org/sax/features/external-general-entities", false)
- setFeature("http://xml.org/sax/features/external-parameter-entities", false)
- setXIncludeAware(false), setExpandEntityReferences(false)

Python - Replace with:
- from defusedxml import ElementTree (instead of xml.etree.ElementTree)

Node.js - Set parser options:
- { noent: false, dtdload: false, dtdvalid: false, nonet: true }

PHP 7.x - Add before parsing:
- libxml_set_external_entity_loader(function() { return null; });

## Output Format
For each vulnerability:
1. File path and line number
2. The vulnerable parser instantiation
3. Attack scenario (what an attacker could access)
4. Language-specific secure configuration

Prioritize by: User input → XML parser = Critical

Frequently Asked Questions

What is an XXE vulnerability?

XXE (XML External Entity) is a vulnerability where XML parsers process references to external files or URLs embedded in XML documents. Attackers can exploit this to read sensitive files like /etc/passwd, access internal services (SSRF), or crash servers. According to OWASP, XXE was significant enough to have its own Top 10 category (A4:2017) before being merged into Security Misconfiguration.

How do XXE attacks work?

XML allows defining "entities" in the Document Type Definition (DTD). External entities reference files (file:///etc/passwd) or URLs (http://internal-server/). When the parser processes the XML, it fetches and includes the entity content. Attackers craft XML with malicious entity definitions, submit it to your application, and the parser reveals secrets. The "Billion Laughs" variant uses nested entity expansion to exhaust server memory.

Is JSON safer than XML for APIs?

Yes, significantly. JSON has no entity mechanism, so XXE attacks are impossible. This is one reason modern APIs prefer JSON over XML. If you don't need XML-specific features (namespaces, DTD validation, XSLT), use JSON instead. XML is still necessary for SOAP services, SVG files, Office documents (DOCX/XLSX are XML-based), and legacy systems. When XML is required, always disable DTD processing.

How do I prevent XXE in Java?

Java's DocumentBuilderFactory is vulnerable by default. Add these settings: dbf.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true) and dbf.setFeature("http://xml.org/sax/features/external-general-entities", false). Also set setXIncludeAware(false) and setExpandEntityReferences(false). The OWASP XXE Prevention Cheat Sheet has complete configurations for each Java parser.

Are modern XML parsers safe by default?

Some are, many aren't. PHP 8.0+ and libxml2 2.9+ (used by many parsers) disable external entities by default. .NET 4.5.2+ XmlReader is safe by default. But Java's DocumentBuilder, Python's standard xml module (for Billion Laughs), and older libraries remain vulnerable unless explicitly configured. Always check your specific parser's documentation and test with a malicious XML payload.

Scan Your Vibe Coded Project

vibeship scanner detects XXE vulnerabilities by identifying XML parsers without security configuration - before attackers find them.

Try vibeship scanner Free