How it works.
Three layers, file-based contracts, a state machine with quality gates, and iterative refinement.
Three layers
agent-slides has three layers. Each solves a different problem.
Skills are Markdown files loaded into the agent's context. They encode workflow knowledge: what order to do things, what to check, what mistakes to avoid. Seven skills cover the full lifecycle from template extraction to final polish.
The CLI is the execution layer. It accepts JSON payloads, validates them against typed schemas, runs operations transactionally, and returns structured responses. Skills call the CLI; they never touch python-pptx directly.
The Python library wraps python-pptx with a typed API. The CLI calls the library. You can also use the library directly if you're building your own tooling.
Extraction: reading the template
Every deck starts from a PowerPoint template. The extraction step reads the template and produces machine-readable contracts that downstream skills use when building slides.
slides extract template.pptx --output-dir project/ \
--base-template-out project/base_template.pptx --compact
This produces several artifacts:
| File | What it contains |
|---|---|
resolved_manifest.json | The primary contract. Merges layout families, archetypes, color zones, placeholder geometry, and theme palette into one file. This is what the build skill reads. |
base_template.pptx | A clean copy of the template with all content slides removed. Used as the starting point for rendering. |
template_layout.json | Physical layout families with placeholder positions and sizes. |
content_layout.json | Which archetypes are compatible with which layouts. |
archetypes.json | Available archetypes (title_slide, kpi_trio, chart, timeline, etc.) with their constraints. |
icons/ | Vector icons extracted from template slides, usable via add_icon. |
After extraction, the agent (or the user) writes a design-profile.json that pins font sizes, allowed colors, contrast rules, and the paths to the template and catalog files. This profile flows through every downstream step.
Planning: brief to slide structure
The agent reads the user's brief and the resolved manifest, then generates a slides.json file with two sections:
The plan (DeckPlan) describes the narrative arc: deck title, audience, objective, and a list of slides. Each slide has a story_role (opening, data, recommendation, etc.), an archetype_id (how it should look), and an action_title (a complete sentence stating the slide's point).
The operations (OperationBatch) are the rendering instructions: add slides, place text, insert charts, set backgrounds. Every operation uses precise geometry (inches) and references layouts from the manifest. The agent doesn't guess positions; it reads them from the extraction contracts.
// slides.json (simplified)
{
"plan": {
"deck_title": "Q3 Growth Strategy",
"brief": "10-slide strategy deck...",
"slides": [
{
"slide_number": 1,
"story_role": "opening",
"archetype_id": "title_slide",
"action_title": "Q3 growth depends on three bets"
}
]
},
"ops": {
"operations": [
{"op": "add_slide", "layout_name": "Title Slide"},
{"op": "set_semantic_text", "slide_index": 0,
"role": "title", "text": "Q3 growth depends on three bets"}
]
}
}
Rendering: ops to .pptx
The CLI takes the slides.json and produces a PowerPoint file:
slides render --slides-json @slides.json \
--profile design-profile.json \
--output output.pptx --compact
Operations execute sequentially. By default, execution is transactional: if operation #7 fails, operations #1-6 are rolled back and the deck is left unchanged. The agent can also do a dry run first to catch errors without writing to disk.
The output is deterministic. Same inputs produce the same bytes every time (timestamps zeroed, GUIDs derived from content hashes, ZIP members sorted).
Quality gates
After rendering, two quality gates check the output:
The content gate (critique) checks storytelling. Are the action titles assertive sentences? Is the structure logically organized? Does the visual archetype match the content type? It runs via plan-inspect.
The visual gate (audit + lint) checks technical quality. Font sizes within the profile's bounds, no overlapping shapes, sufficient color contrast, all content within slide margins. It runs via lint and qa.
# content check
slides plan-inspect --slides-json @slides.json \
--out plan_content.json --compact
# visual check
slides lint output.pptx --profile design-profile.json \
--out lint.json --compact
slides qa output.pptx --profile design-profile.json \
--out qa.json --compact
Both gates produce structured JSON reports. The agent reads these reports, generates fix operations, and applies them.
The state machine
The /slides-full orchestrator doesn't run a linear pipeline. It runs a state machine with retry loops:
If either gate fails, the orchestrator routes to a fix step, applies small reversible patches, then rechecks. It retries up to 3 times for content issues and 2 times per slide for visual issues. If a fix doesn't improve things, it stops and reports what's left.
Every fix is an ops patch applied via slides apply. Patches are small (one concern at a time), reversible (the deck can be re-rendered from the original plan), and inspectable (the JSON file is on disk).
File-based contracts
Skills don't share memory. Each invocation is a fresh CLI call. They communicate through JSON files on disk:
| Phase | Reads | Writes |
|---|---|---|
| Extract | template.pptx | resolved_manifest.json, base_template.pptx, icons/ |
| Build | resolved_manifest.json, design-profile.json | slides.json, output.pptx |
| Audit | output.pptx, design-profile.json | lint.json, qa.json, audit-fixes.json |
| Critique | output.pptx, slides.json | critique-fixes.json |
| Polish | output.pptx, design-profile.json | polish-fixes.json, final output.pptx |
This means every intermediate artifact is visible on disk. When something goes wrong, you can read the files to figure out where. If a session is interrupted, the next session picks up from whatever files already exist.
Designed for agents
Every layer of agent-slides is shaped by the assumption that an AI agent is the primary caller. This influences how information flows through the system.
Progressive disclosure. The agent doesn't receive everything at once. A skill file is the map: 100-200 lines of workflow knowledge that point to deeper sources. Reference files load conditionally based on what the plan contains. slides docs provides runtime schema introspection. Extraction contracts hold the raw geometry and constraints. Each layer is accessible, but none enters context until the agent pulls it. See the design note on progressive disclosure for why this matters.
JSON payloads, not flags. The CLI accepts full operation batches as JSON. The agent doesn't translate intent into flags; it writes a structured payload that maps directly to the internal schema. slides render --slides-json @slides.json is the entire interface. No flag translation, no argument parsing ambiguity.
Input hardening. The agent is not a trusted operator. Every JSON payload is validated against Pydantic schemas before execution. Unknown fields are rejected. Out-of-bounds coordinates are caught. SLIDES_ENFORCE_CWD sandboxes output paths to the working directory. Structured error responses include a suggested_fix field so the agent can self-correct without human intervention.
Context-window discipline. The --compact flag strips verbose output. Deterministic rendering (zeroed timestamps, content-hash GUIDs, sorted ZIP members) means identical inputs produce identical bytes, so the agent can diff without noise. Extraction artifacts are split across files rather than bundled into one blob.
Mechanical enforcement. Taste is encoded as data in design-profile.json (font ranges, color palettes, contrast thresholds) and enforced by lint and qa commands. The agent doesn't need to remember style rules; the quality gates catch violations and report them as structured JSON with fix suggestions.
Refinement and iteration
After the initial build, you can refine the deck in several ways:
Natural language edits. Tell the agent what to change ("move the chart to the left", "change slide 3 to a split-panel layout"). The /slides-edit skill reads the current deck, generates the right operations, and applies them.
Ops patches. Write a JSON file with operations and apply it directly:
slides apply output.pptx --ops-json @fixes.json --output output.pptx
Text find-and-replace. For quick text changes:
slides edit output.pptx --query "old text" \
--replacement "new text" --output output.pptx
Re-running quality checks. After any edit, re-run audit and critique to catch regressions:
slides qa output.pptx --profile design-profile.json \
--out qa.json --compact
The cycle is: edit, check, fix, check again. The state machine in /slides-full automates this loop, but you can run each step manually when you want finer control.