Working title 'tgame' is provisional. Top-level samples/ and docs/samples/ are gitignored; visual/art pipeline lives outside this repo.
8.6 KiB
Smoke Test: "Caewin's Forge"
Visual pipeline validation exercise. Bounded scope (1–2 days). Defined 2026-05-13.
What we're validating
This is a vibe test, not a pipeline test. The pipeline is already proven (see samples/README.md — C64 multicolor tiger demonstrates extreme-constraint capability; skeleton sample demonstrates target-aesthetic capability). Mechanical correctness is solved.
The open question: can the human-in-the-loop (prompting + curation + composition discipline) produce work with vibe — atmospheric storytelling, faces with inner lives, items with identity beyond function, tier differences that evoke wanting? Or does competent-but-lifeless output cap the project's ceiling?
If vibe lands, "every meaningful progression beat has a visual expression" is structurally credible and the bones-first roadmap is greenlit. If it doesn't, we know now — before committing 4–6 months to data layer, sim, and game code that all assume a visual layer with soul.
The character
Caewin the Steady (already canonized in 12-onboarding.md as the starter minion).
- Heritage: Human, late 20s / early 30s
- Class: Apprentice Engineer, Forge specialty
- Traits: Steady, Lucky
- Bearing: Competent, grounded, soot-stained working clothes. The kind of person who rarely raises his voice but everyone in the workshop listens when he does. Not heroic, not chibi, not idealized.
- Visual cues: Dark hair, slight beard, leather apron over plain linen tunic, sleeves rolled. Calluses you can't see in a bust shot but can feel in the silhouette.
Asset 1 — Backdrop: The Forge (Tier 1)
- Canvas: 640×360 native, palette-locked + Floyd-Steinberg dither
- Perspective: First-person POV, looking across the workshop floor from the player's vantage as guildmaster
- Composition: Anvil center-foreground, forge to the right with embers glowing, bellows visible, tool rack on the left wall, one window with warm afternoon light slanting in, smoke staining the ceiling
- Tier-1 humble cues: Exposed timber beams, cracked plaster, half-finished projects on the bench, dust motes in the light. No banners, no polished surfaces — apprentice's workshop, not guildmaster's forge
- Anchor point: Clear space where Caewin would stand to work (right of anvil, between anvil and forge). Should read as a place a figure could stand without obscuring the forge mouth
- Atmosphere: Warm hearth-light dominant. Muted earthen palette — browns, sepias, rust-reds, restrained ember oranges
- References: Diablo II Lut Gholein blacksmith, Baldur's Gate Friendly Arm Inn workshop, Machinarium's atmospheric interiors
Asset 2 — Portrait: Caewin the Steady
- Canvas: ~256×384 (or your standard portrait canvas), palette-locked + dither
- Framing: Bust shot, frontal, slight off-axis
- Expression: Focused, calm, mouth closed, eyes attentive. Not smiling, not glowering
- Wardrobe: Leather apron over linen tunic, soot stains, no ornamentation, no gear, no jewelry
- Background: Simple atmospheric — soft blur of workshop tones, or flat muted neutral. Low contrast so the face reads
- Tone target: Competent and grounded. If it slips toward heroic-fantasy-protagonist or mobile-cute, it's missing
Asset 3 — Item Card: Crude Pickaxe (Common quality)
- Canvas: 256×384 card canvas, palette-locked + dither
- Frame: Common quality border (silver-toned, no flourish, no gold inlay)
- Silhouette scale: Standard tool at ~85% of canvas, slight diagonal
- Composition layers (per 07-item-cards.md):
- Base: pickaxe silhouette
- Material tint: iron head (cool gray, slightly rusted), hardwood haft (warm brown)
- Component overlays: simple iron pommel cap at the haft base, basic leather grip wrap
- Effect aura: none (Common tier)
- Maker's mark: small stamp (e.g., "C." or whatever your mark style allows)
- Tone target: Honest, functional, almost-but-not-quite drab. The tool a competent apprentice produces — workmanlike, no embarrassment, no glory
Stretch goal — Same recipe, Fine quality
If time permits, generate the same pickaxe at Fine quality (Tier 3 reveal) for direct side-by-side comparison:
- Frame: silver border with subtle gold inlay
- Material: refined moonsilver pickaxe head (pale blue sheen), lacquered ironwood haft (deep brown with grain)
- Component overlays: ornate silver pommel cap, dyed leather grip wrap with small copper studs
- Effect aura: subtle outer glow (per 13-reveal-choreography.md — Fine tier visual treatment)
- Same maker's mark, same silhouette, same scale
The test: Common and Fine placed side by side. The visible difference should be immediately readable at a glance, without reading the quality label. If it isn't, the north star is in trouble.
Vibe evaluation — the actual eval
This is the load-bearing question. Per asset:
Forge backdrop — atmospheric storytelling. Does the workshop feel lived in? Details that suggest someone just was here — half-finished hilt on the bench, rag draped over tongs, a boot print in soot. Lighting that tells you the time of day and mood. The eye knows where to settle.
- Failure mode: forge + anvil + bellows + fire + tools all rendered correctly, reads as a list of objects in a room. No mood.
Caewin portrait — a face with inner life. Do you look at it and think "yeah, that person would be quiet, notice things, never panic"? Personality readable. Traits (Steady, Lucky) subtly legible without being telegraphed.
- Failure mode: competent human male in leather apron, all features correct, no spark. Could be any tutorial NPC.
Item card — object with identity beyond function. Does the pickaxe feel like a specific tool a specific person made and used? Wear pattern on the haft, character to the iron. Not generic stock-photo pickaxe.
- Failure mode: functionally-correct pickaxe centered in a frame that neither fights nor helps it.
Common vs. Fine (stretch) — the Fine makes you want it. Not because of the border, but because materials are obviously richer, construction obviously more careful, pride visibly taken.
- Failure mode: Fine looks like Common with a gold border and glow shader.
The unifying test: could a stranger looking at these assets, with no context, guess what kind of game this is and want to know more? Pipeline-correctness with vibe = the game ships. Pipeline-correctness without vibe = competent screensaver.
Mechanical checks (these matter, but they're table stakes)
These are below the vibe eval, not in place of it — they need to pass but won't determine the verdict on their own:
- Cross-asset coherence. All three feel like they belong in the same game? Same palette density, same dither character?
- Composition cleanliness. Item layers compose without floating bits or misaligned anchors? Backdrop has a real "place to stand"?
- Tier readability (stretch). Common → Fine difference visible at a glance without reading labels?
- Production tractability. How long did each asset take? If each is 8 hours of hand-cleanup, the envelope doesn't close even with great vibe. If it's 30 minutes of generate-and-curate, the production plan stands.
Verdict format
One of three:
- "This will ship." — Pipeline holds, output supports the claims, no major blockers. Specific praise + minor refinements.
- "Close, but..." — Promising direction with a specific gap (dither too dense / not dense enough, tone slips, anchors don't read). Concrete pivots.
- "Stick with digital twins." — Honest evaluation that the pipeline can't sustain the ambition. Specific reasons why, and whether a scope/style pivot would unlock it.
Evaluation will be honest, not generous. That's what makes the smoke test useful.
Why this exercise gates the roadmap
The bones-first roadmap (sim + data + game code first, art last) assumes the visual pipeline can produce the assets the design needs, in the volume and quality required, at a production cost that closes within solo + AI augmentation. That's a load-bearing assumption.
The visual-progression north star can't be validated by simulation. It's a felt design claim. The only way to know whether the pipeline supports it is to render a tier-1 and a tier-3 asset and see if the difference produces the "I want that for my character" pull.
If the smoke test ships green, the rest of the roadmap proceeds. If it ships red, we pivot — possibly to a different art direction, possibly to a different scope, possibly to a different style discipline. Better to find out now than at month 4.