tgame/docs/09-art-style.md
Parley Hatch 2abfe4abd1 Initial commit: design docs
Working title 'tgame' is provisional. Top-level samples/ and
docs/samples/ are gitignored; visual/art pipeline lives outside
this repo.
2026-05-17 11:16:07 -06:00

7.3 KiB
Raw Permalink Blame History

Art Style

Late-90s pixel art era — Diablo II / Infinity Engine / Fallout 1-2 territory. Detailed, atmospheric, painterly. Not the iconic-but-coarse 8/16-bit look — the atmospheric detail generation that came after.

Achieved via diffusion-model generation, downsampled to 640×360 native, then Floyd-Steinberg dithered against a 256-color palette derived by kmeans from a library of z-image generated art. The palette is calibrated and locked. The dither pass is what erases the "AI look" — the smooth-gradient, plasticky, hyper-blended tell of diffusion model output gets quantized away into honest pixel art.

Why this style

  • Erases the "AI look". Diffusion model output has a distinctive smooth-gradient / over-blended tell. Palette quantization + dither destroys it cleanly. What ships looks like deliberate hand-pixelled art, not generated art with a filter.
  • Style consistency on AI-generated assets is solved. Palette + dither normalizes everything regardless of model/seed/prompt drift.
  • Asset budgets shrink dramatically. ~1030KB per item sprite (PNG-8 indexed against the 256-palette), whole game's art likely fits in tens of MB.
  • Mobile-optimal. Tiny textures, integer scaling, low GPU cost, perfect on any screen size.
  • Detail without the iconic-pixel-art trap. 256 colors at 640×360 is much more expressive than NES/SNES-era pixel art. We can do beautiful, not just iconic.
  • Animation gets cheap. 4-frame flicker on a torch is trivial. Subtle constant-motion details (bubbling cauldrons, drifting embers, blinking portraits) become affordable.

Asset production pipeline

  1. Generate high-res candidate on workstation (z-image or comparable diffusion model).
  2. Downsample to target pixel resolution.
  3. Apply Floyd-Steinberg dither against the master 256-color palette.
  4. Optional hand-clean of ambiguous pixels.
  5. Commit to content repo.

Workflow is solved. It's a script.

Native resolution

  • Pixel-native canvas: 640×360. Late-90s era target — too small loses the "beautiful" we want, too big loses the pixel identity.
  • Cards integer-scaled from a smaller native (likely 256×384), backdrops at 640×360 or fractional multiples for atmospheric detail.

Palette

256-color palette derived via kmeans clustering from a library of z-image generated art. Calibrated against what the source diffusion model actually produces, so quantization is lossy but never wrong. Locked.

Recoloring within the palette is essentially free — Common iron pulls from the gray range; Masterwork from the silver range; Cursed variants pull from purple shadow ranges. Effect-driven recolor (poison gives a green tint pass, fire gives an orange tint pass) is shader work, not new art.

The pixel/HD coherence rules

Mixed-resolution UI needs a deliberate vocabulary.

Layer Resolution Examples
World Pixel art, native res, integer-scaled, dithered, palette-locked Backdrops, items, portraits, components, gear, particles
Chrome HD vector / native UI frames, buttons, dialogs, tooltips, system text
Effect shaders Choice per effect Holo/foil pixel-aligned; bloom and glow can be smooth at low intensity
Text HD or BMFont pixel font Pixel font for flavor (item names, log entries); HD for menus / system / accessibility

Rule: shader effects on pixel art should respect the pixel grid when they're large or visible. Pixelated holo shimmer reads as intentional. Smooth holo on pixel art looks wrong. Subtle glow/bloom can break the rule because the eye reads them as atmosphere, not art.

References

Primary era reference (late-90s pixel art):

  • Diablo II — the canonical fantasy item/loot/crafting UI. Composed item sprites in inventory grids, atmospheric dark palette, painterly pixel art. Closest single reference to what we're building.
  • Baldur's Gate / Planescape Torment / Icewind Dale (Infinity Engine) — pre-rendered painted backdrops with pixel-art character sprites overlaid on anchor points. Exactly the Guild Hall + minion-overlay model.
  • Fallout 1 / 2 — isometric pixel art, item-driven inventory, restricted palette mood.

Secondary modern references (style mixing rules, UI / shader treatment):

  • Loop Hero — pixel art card-driven progression, very close in mood
  • Octopath Traveler / Sea of Stars (HD-2D) — pixel sprites with HD lighting/depth
  • Dead Cells — pixel art world with crisp HD UI chrome
  • Stardew Valley — pixel art crafting / inventory at scale
  • Eastward / Hyper Light Drifter — palette-as-identity discipline

Perspective discipline

Different surfaces follow different conventions, intentionally.

  • Backdrops: first-person POV. You-the-guildmaster are the camera. At the Forge you see the anvil from your POV with the smith across from you. At the Alchemy Table you're looking down into your own bubbling cauldron. At the Patron Court you're standing at the bulletin board. Late-90s/early-2000s home-base interior feel (Skyrim, BG3 camp, Slay the Spire / Inscryption dealer table, Myst painted scenes).
  • Item cards: free per item. Each card composes the item however it displays best — a ring top-down, a sword side-on, a polearm diagonal, a vial three-quarter. No unified vantage. The discipline is size consistency, not perspective.
  • Portraits: bust shots. Standardized for layered composition (face / hair / costume layers).
  • World map: top-down parchment. The one exception, because the map is a literal artifact you read.

Reference: Slay the Spire / Inscryption / Hand of Fate / Reigns for first-person card-table feel; Myst / Riven for painted first-person standing-in-a-place backdrops; Diablo II town hubs for atmospheric standing-in-a-place fantasy.

Size consistency

The real visual-coherence problem (rather than perspective). Three levels of rule:

Card canvas — fixed. Every card renders to a uniform pixel canvas (likely 256×384). Frame is uniform. Item silhouette inside scales to fit.

Per-category silhouette scale — tuned constants:

  • Tiny items (rings, amulets, gems, stoppers): ~50% of canvas, padded
  • Small items (daggers, potions, scrolls): ~70%
  • Standard items (swords, hammers, staves): ~90%
  • Large items (polearms, two-handers, longbows): rotated diagonally to fit, capped at ~95%

Anchor-point depth scale on backdrops. Every first-person backdrop defines named anchor points where minion or item sprites can place. Each anchor has a depth-scale value (close-to-camera = 1.0, deep-in-scene = 0.50.7). Sprites render at the anchor's depth scale automatically — no accidental giant-figure-at-the-back-of-the-room moments.

Animation

4 fps for sprite animation — confirmed via working production pipeline (Flux still → Wan 2.2 video → 4 fps keyframe → palette conversion). Era-correct for late-90s reference (Diablo II walk cycles ran 4-8 fps depending on direction). See 06-asset-pipeline.md for the animated-sprite pipeline and samples/ for an example.

Open questions

  • Pixel font choice — which font, what size, when do we fall back to HD font.
  • Do we ever break the pixel rule for special moments (cinematic Legendary discoveries with HD splash)?
  • Inventory grid model: uniform 1-cell cards (mobile-friendly, simpler) vs. Diablo-style multi-cell items (era-true, more tactile, Tetris-puzzle feel).