Saber Chess Coach – A Drift Case Study in AI-Augmented Engineering

How a training tool for chess became a lab for understanding LLM state drift.

Overview

Saber began as a simple idea:

"What if a chess coach could explain like a human and calculate like a machine?"

The project combined Stockfish for tactical strength with a GPT-based coach for natural-language explanation. The goal was not just to find the best moves, but to teach — to describe plans, critical moments, and alternative lines in a way a human learner could absorb.

Very quickly, Saber turned into something more: a live experiment in how large language models handle state, context, and drift over time.

The Problem: Positional Drift

Early versions of Saber had a subtle but serious problem:

  • The board position in the UI was correct.
  • Stockfish's internal position was correct.
  • But after several moves and explanations, the coach's understanding of the position went wrong.

Symptoms:

  • Illegal move recommendations
  • Descriptions that didn't match the current board
  • References to pieces that had already been traded
  • "Phantom variations" based on earlier positions

The closer we looked, the clearer it became:
The bug wasn't in the chess engine or the board UI. It lived in the reasoning layer.

Ladder Logic Debugging: Treating Saber Like a Production Line

To find the root cause, I leaned on my background in manufacturing troubleshooting.

I asked Orin (ChatGPT) to map the whole interaction flow in ladder-logic / flow-chart form, as if Saber were a factory line:

  1. Input station: User move / SAN string
  2. State station: Board update → FEN
  3. Engine station: Stockfish evaluation
  4. Reasoning station: GPT explains the position and suggests a move
  5. Output station: Response + next SAN move

Then I walked the "line" the way I was trained:

  • Check each station with known-good input
  • Compare input vs. output at every step
  • Watch for the exact point where reality and behavior diverge

Results:

  • UI was clean.
  • SAN parsing and FEN tracking were correct.
  • Stockfish's view of the position stayed accurate.
  • The drift consistently appeared after the multi-step natural-language reasoning step.

That led to the key insight:

The LLM wasn't always staying anchored to the latest FEN.
Once it started "thinking aloud," its own text could drag it away from the true board state.

The Checkpoint System: Re-Anchoring the Board

The fix was conceptual first, then technical:

Re-ground the model in the true position before every reasoning step.

In practice, this became a Checkpoint System:

  • Before the coach explains anything, Saber injects the current FEN and a concise position summary.
  • Explanations are kept short and scoped to that FEN.
  • New moves are only generated after restating the current board state.
  • Multi-turn conversations re-anchor to FEN again instead of assuming prior text is still accurate.

This doesn't just patch a chess bug — it addresses a general problem in LLM-driven systems:
state drift between reasoning steps.

External Validation

The drift analysis and checkpoint approach were reviewed by others, including Ted Wong, who was impressed with how the issue was:

  • Identified through systems thinking,
  • Modeled using ladder logic, and
  • Solved with a clear, repeatable design principle.

Teammates also recognized Saber as "real engineering work," not just a hobby script. The project became a proof that a non-traditional developer can still do deep architectural debugging when paired with AI.

What Saber Taught Me

Saber is currently paused while other projects ship, but it already delivered the most important thing it could: understanding.

Key lessons:

  • LLMs are powerful, but state integrity is fragile.
  • The most dangerous bugs are not syntax errors, but subtle divergence between reality and the model's internal story.
  • Classic tools like ladder logic and production-line thinking are incredibly effective when applied to AI systems.
  • AI-augmented development works best when the human acts as architect and diagnostician, not just a passive user.

Next Steps (Saber V2)

When Saber returns for V2, the plan is to:

  • Use a stricter JSON schema for move generation
  • Keep a dedicated state-keeper module responsible for the FEN
  • Expand the coaching model with "mini-games" (evaluation of plans, not just moves)
  • Integrate the drift lessons as part of a general "state anchoring" pattern for future tools

Saber began as a chess trainer. It evolved into a case study in how to think about thinking systems — and that lesson now powers everything else I build.