Using three AI models in one product (and why)
When we started building FastRead.io — a platform that generates full books from a topic — we had a naive assumption: pick the best AI model and use it for everything. That lasted about a week. The reality is that no single model is best at everything, and if you want a product that genuinely works well, you need to play to each model's strengths.
Why one model wasn't enough
FastRead generates books with three types of content: long-form text (chapters, narratives, explanations), illustrations (cover art, chapter images, diagrams), and audio narration (full audiobook-style TTS).
We tested every model on every task. Claude wrote the best long-form content by a wide margin — coherent narrative across 10+ chapters that actually reads like a book, not a series of disconnected blog posts. But Claude doesn't generate images. Gemini produced the best images for our use case and handled visual prompts well. OpenAI's text-to-speech was the most natural-sounding for long narration.
So we ended up with three models, each doing what they do best.
The orchestration layer
The tricky part isn't calling three APIs. It's making the outputs feel like one coherent product. A book generated by three different systems shouldn't feel like three different systems made it.
We built an orchestration layer in Python that manages the entire generation pipeline:
1. Topic analysis and outline generation (Claude) — creates the book structure, chapter titles, and a narrative arc 2. Chapter generation (Claude) — writes each chapter with context from previous chapters to maintain continuity 3. Image generation (Gemini) — creates illustrations based on chapter content, with style consistency prompts 4. Audio generation (OpenAI TTS) — narrates each chapter with consistent voice and pacing 5. Assembly — combines everything into the final book format
The critical piece is context passing. When Claude writes chapter 5, it has a summary of chapters 1-4. When Gemini generates an image for chapter 5, it gets the chapter content plus a style guide derived from the cover art. Everything is connected.
Failure handling is the real product
In demos, multi-model orchestration looks elegant. In production, things break constantly. API rate limits hit at 2 AM. A model returns something unusable. Generation takes 3x longer than expected for a particularly complex chapter.
Our failure handling strategy:
- Retry with exponential backoff for transient failures - Fallback prompts for each model — if the primary prompt produces garbage, we have a simpler version that's more reliable - Checkpoint system — if generation fails at chapter 7, we don't restart from scratch. We pick up from the last good checkpoint - Quality gates — automated checks between stages that catch issues before they cascade. If a chapter is too short, too repetitive, or doesn't match the outline, it gets regenerated before moving to images
We spent more engineering time on failure handling than on the happy path. That's normal for production AI systems, but nobody talks about it.
Cost management
Running three AI models per book generation isn't cheap. We had to be smart about it.
The biggest cost saver was caching and reuse. If two users request books on similar topics, we don't regenerate everything. We cache outlines, reuse image styles, and share common elements. We also optimized prompt lengths — Claude doesn't need the entire previous chapter to maintain context, just a good summary.
We built a cost tracking system that calculates the API spend per book in real-time. This lets us set accurate pricing and flag any generation that's burning through tokens abnormally.
Lessons learned
After months of running this in production:
1. Model-specific prompts matter more than you think — the same instruction phrased differently can double the output quality 2. Latency adds up — three models in sequence means users wait. We moved to parallel generation where possible and added progress indicators 3. Version pinning is essential — when a model provider updates their model, your carefully tuned prompts might break. Pin versions and test before upgrading 4. The orchestration layer is your competitive advantage — the models are commodities anyone can access. How you combine them is what makes the product unique
Multi-model architecture isn't for every project. If one model does 90% of what you need, just use that one. But when your product genuinely needs different capabilities, don't fight it. Build the orchestration layer right and let each model do what it's best at.
Have a similar challenge?
We build production-grade software for companies that need it done right.
Let's Talk