Documentation

How it works.

Three layers, file-based contracts, a state machine with quality gates, and iterative refinement.

Three layers

agent-slides has three layers. Each solves a different problem.

Skills are Markdown files loaded into the agent's context. They encode workflow knowledge: what order to do things, what to check, what mistakes to avoid. Seven skills cover the full lifecycle from template extraction to final polish.

The CLI is the execution layer. It accepts JSON payloads, validates them against typed schemas, runs operations transactionally, and returns structured responses. Skills call the CLI; they never touch python-pptx directly.

The Python library wraps python-pptx with a typed API. The CLI calls the library. You can also use the library directly if you're building your own tooling.

Skills
CLI
Library
python-pptx

Extraction: reading the template

Every deck starts from a PowerPoint template. The extraction step reads the template and produces machine-readable contracts that downstream skills use when building slides.

slides extract template.pptx --output-dir project/ \
  --base-template-out project/base_template.pptx --compact

This produces several artifacts:

FileWhat it contains
resolved_manifest.jsonThe primary contract. Merges layout families, archetypes, color zones, placeholder geometry, and theme palette into one file. This is what the build skill reads.
base_template.pptxA clean copy of the template with all content slides removed. Used as the starting point for rendering.
template_layout.jsonPhysical layout families with placeholder positions and sizes.
content_layout.jsonWhich archetypes are compatible with which layouts.
archetypes.jsonAvailable archetypes (title_slide, kpi_trio, chart, timeline, etc.) with their constraints.
icons/Vector icons extracted from template slides, usable via add_icon.

After extraction, the agent (or the user) writes a design-profile.json that pins font sizes, allowed colors, contrast rules, and the paths to the template and catalog files. This profile flows through every downstream step.

Planning: brief to slide structure

The agent reads the user's brief and the resolved manifest, then generates a slides.json file with two sections:

The plan (DeckPlan) describes the narrative arc: deck title, audience, objective, and a list of slides. Each slide has a story_role (opening, data, recommendation, etc.), an archetype_id (how it should look), and an action_title (a complete sentence stating the slide's point).

The operations (OperationBatch) are the rendering instructions: add slides, place text, insert charts, set backgrounds. Every operation uses precise geometry (inches) and references layouts from the manifest. The agent doesn't guess positions; it reads them from the extraction contracts.

// slides.json (simplified)
{
  "plan": {
    "deck_title": "Q3 Growth Strategy",
    "brief": "10-slide strategy deck...",
    "slides": [
      {
        "slide_number": 1,
        "story_role": "opening",
        "archetype_id": "title_slide",
        "action_title": "Q3 growth depends on three bets"
      }
    ]
  },
  "ops": {
    "operations": [
      {"op": "add_slide", "layout_name": "Title Slide"},
      {"op": "set_semantic_text", "slide_index": 0,
       "role": "title", "text": "Q3 growth depends on three bets"}
    ]
  }
}

Rendering: ops to .pptx

The CLI takes the slides.json and produces a PowerPoint file:

slides render --slides-json @slides.json \
  --profile design-profile.json \
  --output output.pptx --compact

Operations execute sequentially. By default, execution is transactional: if operation #7 fails, operations #1-6 are rolled back and the deck is left unchanged. The agent can also do a dry run first to catch errors without writing to disk.

The output is deterministic. Same inputs produce the same bytes every time (timestamps zeroed, GUIDs derived from content hashes, ZIP members sorted).

Quality gates

After rendering, two quality gates check the output:

The content gate (critique) checks storytelling. Are the action titles assertive sentences? Is the structure logically organized? Does the visual archetype match the content type? It runs via plan-inspect.

The visual gate (audit + lint) checks technical quality. Font sizes within the profile's bounds, no overlapping shapes, sufficient color contrast, all content within slide margins. It runs via lint and qa.

# content check
slides plan-inspect --slides-json @slides.json \
  --out plan_content.json --compact

# visual check
slides lint output.pptx --profile design-profile.json \
  --out lint.json --compact
slides qa output.pptx --profile design-profile.json \
  --out qa.json --compact

Both gates produce structured JSON reports. The agent reads these reports, generates fix operations, and applies them.

The state machine

The /slides-full orchestrator doesn't run a linear pipeline. It runs a state machine with retry loops:

Extract
Build
Content gate
Visual gate
Done

If either gate fails, the orchestrator routes to a fix step, applies small reversible patches, then rechecks. It retries up to 3 times for content issues and 2 times per slide for visual issues. If a fix doesn't improve things, it stops and reports what's left.

Every fix is an ops patch applied via slides apply. Patches are small (one concern at a time), reversible (the deck can be re-rendered from the original plan), and inspectable (the JSON file is on disk).

File-based contracts

Skills don't share memory. Each invocation is a fresh CLI call. They communicate through JSON files on disk:

PhaseReadsWrites
Extracttemplate.pptxresolved_manifest.json, base_template.pptx, icons/
Buildresolved_manifest.json, design-profile.jsonslides.json, output.pptx
Auditoutput.pptx, design-profile.jsonlint.json, qa.json, audit-fixes.json
Critiqueoutput.pptx, slides.jsoncritique-fixes.json
Polishoutput.pptx, design-profile.jsonpolish-fixes.json, final output.pptx

This means every intermediate artifact is visible on disk. When something goes wrong, you can read the files to figure out where. If a session is interrupted, the next session picks up from whatever files already exist.

Designed for agents

Every layer of agent-slides is shaped by the assumption that an AI agent is the primary caller. This influences how information flows through the system.

Progressive disclosure. The agent doesn't receive everything at once. A skill file is the map: 100-200 lines of workflow knowledge that point to deeper sources. Reference files load conditionally based on what the plan contains. slides docs provides runtime schema introspection. Extraction contracts hold the raw geometry and constraints. Each layer is accessible, but none enters context until the agent pulls it. See the design note on progressive disclosure for why this matters.

JSON payloads, not flags. The CLI accepts full operation batches as JSON. The agent doesn't translate intent into flags; it writes a structured payload that maps directly to the internal schema. slides render --slides-json @slides.json is the entire interface. No flag translation, no argument parsing ambiguity.

Input hardening. The agent is not a trusted operator. Every JSON payload is validated against Pydantic schemas before execution. Unknown fields are rejected. Out-of-bounds coordinates are caught. SLIDES_ENFORCE_CWD sandboxes output paths to the working directory. Structured error responses include a suggested_fix field so the agent can self-correct without human intervention.

Context-window discipline. The --compact flag strips verbose output. Deterministic rendering (zeroed timestamps, content-hash GUIDs, sorted ZIP members) means identical inputs produce identical bytes, so the agent can diff without noise. Extraction artifacts are split across files rather than bundled into one blob.

Mechanical enforcement. Taste is encoded as data in design-profile.json (font ranges, color palettes, contrast thresholds) and enforced by lint and qa commands. The agent doesn't need to remember style rules; the quality gates catch violations and report them as structured JSON with fix suggestions.

Refinement and iteration

After the initial build, you can refine the deck in several ways:

Natural language edits. Tell the agent what to change ("move the chart to the left", "change slide 3 to a split-panel layout"). The /slides-edit skill reads the current deck, generates the right operations, and applies them.

Ops patches. Write a JSON file with operations and apply it directly:

slides apply output.pptx --ops-json @fixes.json --output output.pptx

Text find-and-replace. For quick text changes:

slides edit output.pptx --query "old text" \
  --replacement "new text" --output output.pptx

Re-running quality checks. After any edit, re-run audit and critique to catch regressions:

slides qa output.pptx --profile design-profile.json \
  --out qa.json --compact

The cycle is: edit, check, fix, check again. The state machine in /slides-full automates this loop, but you can run each step manually when you want finer control.