The most common mistake when building AI-powered apps is adding AI everywhere.
It feels like the right move — you have a powerful model available, why not use it? But AI adds latency, cost, and unpredictability. Used poorly, it makes your app slower, more expensive, and harder to reason about than plain code.
The better question is not “how can I use AI here?” but “should AI handle this at all?”
The core decision: deterministic vs non-deterministic
Most features in an app are deterministic. Given the same input, they should always produce the same output. Sorting a list, calculating a total, validating an email, routing to the right page — these are deterministic. Write code.
Some features are non-deterministic by nature. They involve language, judgment, interpretation, or generation where there is no single correct answer. Summarizing a document, drafting a reply, extracting intent from a message, generating content — these are where AI earns its cost.
Rule of thumb: if you can write a reliable unit test that covers all cases, write code. If the “correct” output depends on context, tone, or interpretation, AI is appropriate.
A decision framework
Ask these questions about any feature:
1. Is the output always the same for the same input?
- Yes → write code
- No → AI might be appropriate
2. Does the feature require understanding language or intent?
- Yes → strong candidate for AI
- No → write code
3. What happens when it fails?
- Silent wrong answer is catastrophic → keep it in code with validation
- A slightly off output is acceptable → AI is viable
4. How often is this called?
- Thousands of times per second → AI will be expensive and slow; cache or rethink
- Dozens of times per day → AI cost is negligible
5. Does a user expect to see the output immediately?
- Yes → stream the response or show a loading state; AI latency must be handled
- No → batch processing with AI works well here
What belongs in code
- Business logic and calculations
- Routing and navigation
- Authentication and authorization
- Data validation and constraints
- CRUD operations
- Formatting and display logic
- Anything with a compliance or regulatory requirement
Code is predictable, testable, fast, and cheap. Do not replace it with AI unless you have a clear reason.
What belongs in AI
- Generating first drafts (emails, summaries, descriptions)
- Extracting structured data from unstructured text
- Classifying or tagging content
- Answering questions from a knowledge base
- Personalizing language or tone
- Handling natural language input
- Generating code from specs or requirements
- Explaining things in plain language
These tasks benefit from AI because they require judgment and language understanding that rule-based code cannot match without massive engineering effort.
The hybrid pattern
The best AI-native features combine both. Code handles structure and validation; AI handles the language layer.
Example: support ticket routing
Bad version: AI reads the ticket and routes it. Problem: AI might misclassify, and misrouted tickets cost support time.
Better version:
- Code extracts structured fields from the ticket (subject, user plan, error code if present)
- AI classifies the category based on those fields — and returns a confidence score
- Code routes high-confidence classifications automatically
- Low-confidence ones go to a human review queue
AI does the language work. Code enforces the business rules.
Example: content generation
Bad version: AI writes the whole page and publishes it. Better version:
- AI generates a draft
- Code validates it meets minimum length, contains required keywords, has no broken links
- Human reviews and publishes
- Code tracks performance and flags underperformers for revision
Prompt design is product design
When you decide AI should handle something, prompt quality determines product quality. A few principles:
Be specific about format. Vague prompts produce variable outputs. If you need a bullet list, say “return as a bullet list with no more than 5 items.” If you need JSON, specify the exact schema.
Give examples. One or two examples in the prompt dramatically improve consistency. This is called few-shot prompting.
Constrain the scope. “Write a product description” produces anything. “Write a 2-sentence product description for a developer tool audience, focusing on time saved” produces something useful.
Handle failure modes explicitly. Tell the model what to do when input is ambiguous, too short, or off-topic. Do not rely on the model to figure it out.
Test with edge cases. The inputs that break your prompt are the ones users will send. Test with empty inputs, very long inputs, non-English inputs, and adversarial inputs before shipping.
Avoiding over-engineering with AI
The temptation is to build a complex AI pipeline when a simple one works.
If your feature needs three sequential AI calls to produce a result, ask whether one well-designed prompt could do the same. Often it can. Each AI call adds latency and cost — consolidate where possible.
If you are building a multi-agent workflow to process a form, ask whether a single-agent call with a detailed prompt would work. Start simple. Add agents when you have evidence that complexity is needed, not before.
Cost and latency budgets
Design with real numbers:
- Haiku: ~$0.25 per million input tokens, ~50ms response time
- Sonnet: ~$3 per million input tokens, ~100-200ms response time
- Opus: ~$15 per million input tokens, ~300-500ms response time
For a feature called 10,000 times per day with 500 input tokens each, that is 5 million tokens per day. At Sonnet pricing, that is $15/day — manageable. At Opus pricing, that is $75/day — time to use Haiku or cache responses.
Build cost and latency into your design from the start, not after.
Next steps
- How to Add Claude to Your App Using the Anthropic API — implementation guide
- MCP Explained: How Claude Connects to Any Tool or Data Source — extending Claude with tools
- Anthropic prompt engineering guide