Building Trivista: What AI Coding Actually Solves (and What It Doesn't)

Regular trivia nights with my wife and friends sparked the idea for Trivista. We loved the social connection and friendly competition, but coordinating schedules was always a challenge. I wanted to build an app that could recreate that experience whether playing together in real-time or asynchronously on our own schedules, on any topic we could imagine.

I also saw an opportunity to test a hypothesis: could AI coding tools enable a solo developer to build what would traditionally require a team?

The answer is nuanced. Here’s what actually happened.

What I Built

Trivista is a multiplayer trivia app with two core content systems:

AI-generated quizzes: Real-time generation using Perplexity Sonar with context-aware images from fal.ai Flux
Curated content: High-quality questions from OpenTriviaDB with scheduled image generation
Flexible multiplayer: Real-time competitions or asynchronous play
Party mode: Local WiFi auto-discovery for friction-free sessions
Premium subscriptions: RevenueCat-powered, unlimited AI quizzes

The tech stack: Flutter for mobile, Next.js for web, Firebase for backend, Perplexity + fal.ai for AI features.

Full disclosure: I built this for the RevenueCat hackathon, which meant aggressive timelines and deliberate scope constraints.

The Development Workflow

I used what I call spec-driven vibe coding with Claude Code. The process:

Write detailed specs: I specified exact architecture, data models, API contracts
Use plan mode religiously: Claude Code proposes implementation, I review and refine
Execute and iterate: Claude Code handles implementation, I test and refine specs

This wasn’t “tell AI what to build and it magically works.” It was more like having a senior developer who executes exactly what you spec, but needs explicit architectural guidance.

What Claude Code handled well:

Implementing Flutter widgets following architectural patterns I specified
Writing Firebase Cloud Functions with error handling
Generating boilerplate (state management, API clients, UI components)
Following established code patterns across the codebase

Where it struggled:

Making architecture decisions (data model design, API structure)
Catching edge cases in multiplayer state synchronization
Optimizing for cost without explicit guidance
UX polish and iterative refinement

To speed iteration, I built flutter-auto-reload, CLI tools that enabled hot reload across iOS phone, iOS tablet, Android phone, and Android tablet simultaneously. Every change Claude Code made reloaded on all four screens instantly.

This workflow let me focus on product vision and architecture while Claude Code handled execution. But “handled execution” meant implementing my detailed specs, not making strategic decisions.

The spec quality matters

The more detailed your architectural specs, the better AI coding tools perform. Vague prompts like “build a multiplayer system” produce generic solutions. Specific guidance like “use Firestore subcollections with optimistic updates and server-side validation” gets you production-ready code.

Key Technical Challenges

Real-Time AI Generation Under 30 Seconds

Users expect instant results. The system needed to:

Validate topic for safety
Generate 10 questions with Perplexity Sonar (querying real-time web data)
Generate 10 context-aware images with fal.ai Flux
Return complete quiz in under 30 seconds

The solution required aggressive optimization:

Pre-validation before expensive operations: Check topic safety before calling Perplexity to fail fast
Parallel image generation: Process 10 images simultaneously instead of sequentially
Fast Flux model (Schnell): 4 inference steps vs 28 for the Dev model (7x faster)
Increased Cloud Function resources: 1GiB memory, 5-minute timeout, 10 max instances

Results: ~7 seconds average for 10-question quiz with images. Breakdown: 6s for questions, 1s for images.

The trade-off no one mentions: Flux Schnell produces lower quality images than Flux Dev. I sacrificed visual fidelity for speed. With a design team, we would have A/B tested to find the quality threshold users actually care about. Solo, I made an educated guess.

Security Without Friction

Real-time multiplayer with AI features needed robust security from day one. I implemented:

Firebase App Check: First line of defense against backend abuse
Dual-tracking rate limiting: User ID + hashed IP to prevent abuse while handling shared networks
Never trust the client: All mutations through Cloud Functions except quiz answers (which use real-time listeners for speed)
Content safety: Multi-layer validation (username profanity filter, topic NSFW detection, AI safety prompts, image NSFW filter, community reporting)

The hidden cost: Security added ~40% more implementation time. Rate limiting alone required custom Firestore-based tracking with atomic transactions to prevent race conditions.

The solo indie compromise: I didn’t implement automated NSFW monitoring for generated images because the AI safety prompts + fal.ai’s safety checker at max strictness seemed sufficient. A team would have added CloudFlare’s NSFW API or Google’s Vision API for post-generation validation. I made a calculated risk to ship faster.

Deep dive available in my previous post: Flutter Real-Time Multiplayer with Firebase.

Managing Cost and Scale

The curated content pipeline was designed from the start with asynchronous image generation: Firestore triggers and Cloud Scheduler spread image generation over time, allowing higher-quality Flux models without blocking quiz creation.

AI tools don't estimate batch costs

When pre-generating images for thousands of curated questions, Claude Code didn’t flag the upfront cost spike. It built a correct implementation but didn’t estimate operational costs for large batches. Always manually review cost implications before running expensive operations at scale.

The architecture: Curated questions stored in Firestore trigger background image generation jobs. This async approach meant curated quizzes could use better models than real-time AI quizzes, where speed requirements forced trade-offs.

Why this mattered: Separating real-time (AI quizzes) from async (curated content) let each optimize for different constraints: speed vs. quality.

What AI Coding Doesn’t Solve

Architecture Still Needs Human Judgment

Claude Code implemented exactly what I specified, but it didn’t make strategic architecture decisions:

Data model design: I had to decide on denormalized documents vs subcollections
API contracts: I specified which operations should be Cloud Functions vs client-side writes
Security rules: I wrote the high-level security model, Claude Code implemented the syntax

If I’d asked Claude Code to “design a multiplayer architecture,” it would have produced something generic. The specialized patterns (dual-layer rate limiting, privacy-preserving blocks, aggressive denormalization) came from my experience.

Edge Cases Required Manual Discovery

Multiplayer state synchronization had race conditions that Claude Code didn’t catch:

Players joining mid-game could see inconsistent state
Network interruptions could leave games in zombie states
Rapid answer submissions could violate answer-locking logic

I caught these through manual testing across multiple devices. Claude Code fixed them once I described the issue, but it didn’t proactively identify them.

Cost Optimization Wasn’t Automatic

Initial costs spiked when pre-generating images for thousands of curated questions in batch. While the async pipeline was intentionally designed this way, Claude Code didn’t flag or estimate the upfront cost impact of processing large batches.

This kind of operational awareness—understanding cost implications before hitting “run”—still requires human oversight. AI tools optimize for feature correctness, not resource planning.

The Last 90% Problem

The classic 90/90 rule proved brutally accurate: the first 90% of the code took 90% of the time, and the last 10% took the other 90%. Claude Code handled the initial implementation quickly, but the hard edge cases required extensive manual work.

Context-aware content filtering: Building NSFW filters that understand context, not just keywords. Examples:

Blocking “Game of Thrones violence” while allowing “World War 2 history”
Rejecting “prescription drug abuse” but accepting “medical treatments”
Filtering “graphic violence” vs “historical battles”

The two-tier validation system (always-block vs context-sensitive) required iterative refinement based on real user submissions. Claude Code implemented the structure I specified, but defining the nuanced rules came from manual testing and edge case discovery.

Real-time multiplayer concurrency: Race conditions that only appeared under specific network conditions:

Two players answering simultaneously causing duplicate score updates
Network interruptions leaving games in inconsistent states
Players joining mid-game seeing stale state from optimistic updates

These weren’t bugs Claude Code could catch through static analysis. They emerged during multi-device testing and required careful transaction design, retry logic, and state reconciliation patterns.

The pattern: AI tools excel at the initial 90%. The final 10%—nuanced judgment, edge case handling, production hardening—still requires deep domain expertise and extensive testing.

Solo Indie Means Cutting Corners

Things I would have done differently with a team:

A/B testing: I guessed at Flux Schnell vs Dev quality threshold instead of testing with users
Design polish: The UI is functional but not delightful. A designer would have refined the experience
QA coverage: I tested on 4 devices but missed iOS-specific edge cases a dedicated QA would catch
Analytics depth: Basic Firebase Analytics instead of comprehensive funnel analysis

AI tools don’t replace team specialization. They let you execute faster in areas you already understand.

Lessons Learned

About AI coding:

Detailed specs are essential: The more specific your architectural guidance, the better Claude Code executes. Vague prompts produce generic solutions.
Prior experience is non-negotiable: I’ve built Flutter apps for years with established patterns. Claude Code executed against this knowledge base, not teaching me fundamentals.
Manual oversight still required: Edge cases, UX polish, and cost optimization need human judgment.

About the tech:

Perplexity Sonar for accuracy: Online models ensure current, factual quiz content.
Flux Schnell for speed: 4 inference steps enabled real-time generation with quality trade-offs.
RevenueCat just works: Zero friction subscriptions and entitlements across platforms.

About solo indie:

Custom tooling matters: flutter-auto-reload enabled instant testing across 4 devices simultaneously.
Ruthless scope cuts: Sacrificed features, polish, and analytics depth to ship fast.

AI tools are force multipliers for experienced developers, not replacements. They handle execution brilliantly when you know what to build and how to architect it.

The Honest Bottom Line

AI coding tools changed what’s possible for solo developers. I built a cross-platform app with real-time multiplayer, AI features, and subscription monetization in a fraction of traditional development time.

But you still need years of domain experience, detailed architectural specs, manual testing, and systems thinking for production concerns. AI tools are force multipliers for experienced developers, not replacements.

The paradigm shift is real: experienced developers can now build solo what required teams before. But emphasis on experienced. You can’t skip years learning architecture, security, and UX just because AI writes code.

Try Trivista

Available at trivista.ai on App Store & Google Play

Read more:

Flutter Real-Time Multiplayer with Firebase - Security architecture deep dive
Flutter Auto-Reload CLI Tools - Custom tooling for vibe coding workflows