tgame/docs/06-asset-pipeline.md
Parley Hatch 2abfe4abd1 Initial commit: design docs
Working title 'tgame' is provisional. Top-level samples/ and
docs/samples/ are gitignored; visual/art pipeline lives outside
this repo.
2026-05-17 11:16:07 -06:00

5.1 KiB
Raw Blame History

Asset Pipeline

AI lives in production, not in the runtime. The shipped game is fully self-contained, fully offline, no workstation dependency, no API calls, no surprise model regressions in prod.

What gets pre-rendered

The asset library is atomic components, not finished items. Items get composed at runtime from layered sprites + shader passes — see 07-item-cards.md. Many effects (glow, foil, holo, corruption, heat haze, frost, holy rays, first-reveal sweep) are pure shaders and need no PNG at all.

Asset Approach
Minion base sprites (face/hair/skin) Layered sprite library
Class costume layers Layered sprites by class + tier
Gear overlay sprites (per slot) Layered sprite library, hundreds of variants
Item base silhouettes ~3060 per item category (sword, dagger, vial, ring, etc.)
Material tints ~20 per applicable material; applied via shader or pre-baked overlay
Component overlays Pommel gems, hilt wraps, stoppers, labels, engravings — authored once, recombined
Effect auras Per enchantment / effect type
Quality frames + flourishes 5 quality bands + Legendary specials
Maker stamps / faction sigils Per engineer + faction
Bespoke Legendary art Hand-finished art for truly unique named items only
Ingredient & raw material sprites One per ingredient
Patron portraits One per patron + faction set dressing
Zone backdrops One per zone, possibly day/night/weather variants
UI, icons, frames, particles Standard authored assets
Audio (music + SFX) Pre-bundled

How AI fits in

The workstation (RTX 6000 Ada) runs local diffusion + audio models during production. Two distinct concerns:

Conversion pipeline (solved, scripted)

This is what produces consistent palette / resolution / "honest pixel art" output regardless of the source generator's style drift.

For static assets (items, portraits, backdrops, UI elements):

  1. Generate high-res candidate (Flux, z-image, or comparable diffusion model)
  2. Downsample to target pixel resolution
  3. Apply Floyd-Steinberg dither against the master 256-color palette
  4. Optional hand-clean of ambiguous pixels
  5. Commit

For animated sprites (creatures, NPCs, characters in motion):

  1. Generate the still character in Flux (locks the design)
  2. Animate the still into a video clip with Wan 2.2 (or comparable)
  3. Keyframe extract at 4 fps
  4. Crop and convert each frame through the palette-quantize step
  5. Commit as sprite sheet

This is fully programmatic — see samples/ for a real output of this pipeline (the skeleton attack frame).

Prompting discipline (ongoing, not a script)

Style consistency, character consistency, mood, composition, perspective, framing — these are prompting concerns, not pipeline concerns. The conversion pipeline cannot rescue a wrong-feeling prompt; it can only normalize color and resolution.

This means:

  • Lock prompt templates per asset type (a "workshop interior backdrop" prompt template; an "enemy creature attack pose" template; a "minion bust portrait" template)
  • Use LoRAs / IP-Adapter / character reference for character consistency across animations and gear updates
  • Curate aggressively at generation time — bad source compositions stay bad after dithering
  • Maintain a prompt library / prompt cookbook alongside the asset library

The prompting discipline is the ongoing art direction work. The pipeline just guarantees the floor.

Audio

Local audio gen (musicgen or similar) for ambient beds and SFX, following the audio direction in 10-tone.md. Same conceptual split: prompting/curation matters, downstream encoding/normalization is automatable.

No part of any pipeline runs on the player's device or talks to a server during play.

Bundle strategy

Composition collapses asset volume substantially compared to authoring finished items.

  • Rough order of magnitude (post-composition shift): ~60 item bases × ~10KB + ~20 material tints + ~150 component overlays × ~5KB + ~40 effect auras + ~300 minion portrait elements + 50 zones + UI/audio. Order of tens to low hundreds of MB rather than several hundred.
  • Likely approach: base bundle + on-demand asset packs per region/discipline so first launch is fast and the player downloads more as they unlock.
  • Texture atlasing for the sprite-heavy parts (component overlays, gear overlays, item bases).
  • Bespoke Legendary art lives in its own pack, downloaded as Legendaries are first encountered.

Minimum viable content for v1

Don't try to ship the full catalog at launch. A playable v1 is more like:

  • 12 regions with ~5 zones total
  • 3 stations (Forge, Alchemy Table, Loom — covers most fantasy crafting cliches)
  • ~150 catalog items
  • ~20 gear overlays per slot
  • ~40 minion portrait permutations
  • 12 patron arcs

Then expand by content patch.

Open questions

  • How much hand-finishing per AI-generated asset before it ships? (Quality bar matters more than volume.)
  • Style consistency across thousands of assets — do we lock to one diffusion model + one LoRA stack and never change, or accept some drift?
  • Audio: original score or licensed loops?