When AI Isn't Fit for Purpose: Lessons from Salesforce's Agentforce Pivot

Salesforce quietly pivoted Agentforce. The original pitch: autonomous AI agents handling customer service issues end-to-end. The reality: inconsistent answers, performance degradation beyond 8 instructions, and enterprises losing trust faster than they gained efficiency.

The solution? “Agent Script” - a deterministic, rule-based scripting layer. Step-by-step logic designed, tested, and maintained by… CIOs. The very work AI was supposed to eliminate.

What Happened

Agentforce launched as autonomous AI that would parse customer requests and resolve them without human intervention. In demos, it worked beautifully. In production, enterprises discovered what anyone building with LLMs eventually learns: probabilistic systems don’t behave consistently at scale.

Agentforce performs best when given no more than 8 instructions per topic.

— Salesforce Ben, What Salesforce Learnt About AI in 2025

Beyond that threshold, quality degraded. Same request, different answers. Customers getting contradictory information depending on when they asked. Enterprises that prided themselves on consistent service quality couldn’t stomach the variance.

The pivot to Agent Script isn’t a failure - it’s a maturation. Salesforce’s Data Cloud + AI ARR grew 120% year-over-year. AI traction is real. But the architecture changed fundamentally: from “AI handles everything” to “AI assists, determinism executes.”

Why It Doesn’t Work

The mistake is trying to make non-deterministic systems do deterministic things.

LLMs are probabilistic by design. Non-determinism isn’t a bug - it’s what enables creativity, flexibility, and handling novel inputs. But that same property makes them unsuitable for tasks requiring consistent, repeatable outcomes.

Prompts are suggestions, not rules. As I covered in Hooks: Guardrails That Actually Work, instructions can be compressed under context pressure, overridden by conflicting signals, or interpreted differently on each run. You can tell an LLM “always respond with X” and it will - until it doesn’t.

Enterprise customer service demands the opposite: the same question should get the same answer, every time. That’s a deterministic problem. Salesforce tried to solve it with a probabilistic tool.

The Pattern

A Hacker News commenter captured the solution:

Don’t ask it to do the task, ask it to write a script that does the task.

LLM parses intent, deterministic system executes. The 12 Factor Agents framework makes it explicit: “LLM outputs decisions, not final execution.”

The Fit-for-Purpose Framework

After seeing this pattern repeat across industries, here’s what actually works:

Use LLMs for:

Intent parsing - Natural language to structured data
Generation - Content, code, drafts that get reviewed
Classification - Routing, categorization, triage
One-shot tasks - No state, no multi-step chains

Use deterministic systems for:

Execution - The actual business logic
State management - Customer records, transactions
Multi-step workflows - Where consistency matters
Audit trails - Anything requiring accountability

The hybrid pattern

User input → LLM parses intent → Deterministic system executes → LLM formats response. AI at the edges, reliability in the middle.

This maps directly to what I covered in When Not to Use AI: “AI for creation, not delivery.” Pre-generate a curated library, let users compose instantly, offer AI enhancements as optional add-ons. The winning products don’t block users on AI generation.

What This Isn’t

This isn’t “AI doesn’t work.” It’s “AI isn’t magic.”

Salesforce’s AI revenue is real. Enterprises are getting value from AI-assisted workflows. The pivot isn’t retreat - it’s recognizing where the technology actually delivers.

The pattern will repeat. Every major AI deployment goes through the same arc:

Hype: “AI will handle this autonomously”
Reality: Variance, edge cases, trust erosion
Maturation: Hybrid architecture with AI at the edges

We’re watching enterprise software work through step 2 in real-time.

The Same Pattern in Code

This applies to AI coding tools too. The hype: AI agents that autonomously build features end-to-end. The reality: developers shipping faster with AI assistance, but still architecting, reviewing, and owning every line.

The productivity gains are real. I ship features in hours that used to take days. But “AI writes the code” masks what actually happens: I describe intent, AI generates candidates, I review and iterate, I verify it works. The LLM handles generation. I handle judgment.

The teams expecting AI to transform their software development process will hit the same wall Salesforce did. The ones treating AI as a power tool for existing workflows - faster iteration, better first drafts, less boilerplate - are already winning.

AI won’t fix your architecture. It won’t resolve your tech debt. It won’t make bad requirements good. It accelerates what you’re already doing. If your process is broken, you’ll ship broken code faster.

The Lesson

The future isn’t autonomous AI agents replacing your workflow. It’s also not traditional automation ignoring AI entirely. It’s the right tool for the right job.

LLMs are incredible at understanding intent and generating responses. They’re unreliable at consistent multi-step execution. Build architectures that use each where they excel.

The best AI products in 2026 will be ones where you can’t tell where AI ends and deterministic systems begin. Seamless handoffs. AI parsing your messy input, reliable systems doing the work, AI formatting the output.

Salesforce learned this by shipping autonomous agents to enterprises that demanded consistency. The pivot cost them credibility with early adopters but bought them a sustainable architecture.

You don’t have to pay that tuition. Start with the hybrid pattern. Let AI assist. Let determinism execute.

When AI Isn't Fit for Purpose: Lessons from Salesforce's Agentforce Pivot

What Happened

Why It Doesn’t Work

The Pattern

The Fit-for-Purpose Framework

What This Isn’t

The Same Pattern in Code

The Lesson

Share this article

Related Posts

Guardrails by Default: Why AI Coding's Next Evolution Isn't Smarter Models

12 Factor Agents: Principles for AI That Actually Work

Cascading AI Pipelines: When One Model Feeds Another