Cheatsheet Claude Anthropic AI LLM Claude Code API Developer Tools 7 min read
Claude API Cheat Sheet: SDK, CLI, MCP & Prompting
By Vishnu Damwala
Quick reference tables
Models
Model ID
Context
Best for
claude-opus-4-6
200K tokens
Complex reasoning, research, long documents
claude-sonnet-4-6
200K tokens
Balanced speed + quality (default choice)
claude-haiku-4-5-20251001
200K tokens
Fast, lightweight, high-volume tasks
Anthropic API — core requests
Task
What to use
Chat completion
POST /v1/messages
Streaming response
stream: true in request body
Count tokens
POST /v1/messages/count_tokens
List models
GET /v1/models
Create a batch
POST /v1/messages/batches
Messages API — key parameters
Parameter
Type
What it does
model
string
Which Claude model to use
max_tokens
int
Maximum tokens in response
messages
array
Conversation history [{role, content}]
system
string
System prompt
temperature
float 0–1
Randomness (0 = deterministic)
top_p
float 0–1
Nucleus sampling
top_k
int
Token sampling pool size
stop_sequences
array
Strings that stop generation
tools
array
Tool/function definitions
tool_choice
object
Force tool use (auto, any, tool)
stream
bool
Stream tokens as they generate
Claude Code CLI — essential commands
Command
What it does
claude
Start interactive REPL
claude "fix this bug"
One-shot prompt, no REPL
claude -p "prompt"
Non-interactive, print output
claude --model claude-opus-4-6
Use a specific model
claude --no-stream
Disable streaming
claude /help
Show available slash commands
claude /clear
Clear conversation history
claude /compact
Compact context to save tokens
claude /commit
Auto-generate and create git commit
claude /review-pr 123
Review a pull request
claude /cost
Show token usage and cost for session
claude /doctor
Check Claude Code health
claude /init
Create a CLAUDE.md for this repo
Claude Code CLI — flags
Flag
What it does
--model
Specify model ID
--api-key
Pass API key directly
--max-tokens
Override max output tokens
--add-dir /path
Add directory to working context
--print / -p
Print output without REPL
--output-format json
JSON output (for scripting)
--output-format stream-json
Streaming JSON output
--verbose
Show full tool call details
--no-stream
Wait for full response
--dangerously-skip-permissions
Skip tool permission prompts
MCP — Model Context Protocol
Command / Concept
What it does
claude mcp add name url
Add an MCP server by URL
claude mcp add name -- cmd args
Add a local MCP server via stdio
claude mcp list
List configured MCP servers
claude mcp remove name
Remove an MCP server
MCP scope local
Available in current project only
MCP scope user
Available across all projects
MCP scope project
Shared via .mcp.json in repo
CLAUDE.md
Project instructions Claude reads on start
Token limits (approx.)
Model
Input limit
Output limit
Opus 4.6
200K tokens
32K tokens
Sonnet 4.6
200K tokens
64K tokens
Haiku 4.5
200K tokens
8K tokens
Prompt caching
Feature
What it does
cache_control: {type: "ephemeral"}
Cache a content block (5-min TTL)
Cache hit
~90% cheaper, ~85% faster than full prompt
Minimum cacheable size
1024 tokens (Opus/Sonnet), 2048 (Haiku)
Cached blocks
Tools, system prompt, messages
Detailed sections
Basic API call (Node.js)
import Anthropic from "@anthropic-ai/sdk";const client = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY });const message = await client.messages.create({ model: "claude-sonnet-4-6", max_tokens: 1024, messages: [{ role: "user", content: "Explain async/await in JavaScript." }],});console.log(message.content[0].text);
Streaming response
const stream = await client.messages.create({ model: "claude-sonnet-4-6", max_tokens: 1024, stream: true, messages: [{ role: "user", content: "Write a short story." }],});for await (const event of stream) { if (event.type === "content_block_delta") { process.stdout.write(event.delta.text); }}
System prompt + multi-turn conversation
const response = await client.messages.create({ model: "claude-sonnet-4-6", max_tokens: 2048, system: "You are a senior backend engineer. Be concise and precise.", messages: [ { role: "user", content: "What's wrong with N+1 queries?" }, { role: "assistant", content: "N+1 queries happen when..." }, { role: "user", content: "How do I fix it in PostgreSQL?" }, ],});
Tool use (function calling)
const response = await client.messages.create({ model: "claude-sonnet-4-6", max_tokens: 1024, tools: [ { name: "get_weather", description: "Get current weather for a city", input_schema: { type: "object", properties: { city: { type: "string", description: "City name" }, }, required: ["city"], }, }, ], messages: [{ role: "user", content: "What's the weather in Tokyo?" }],});// Check if Claude wants to use a toolif (response.stop_reason === "tool_use") { const toolUse = response.content.find((b) => b.type === "tool_use"); console.log(toolUse.name, toolUse.input); // get_weather { city: 'Tokyo' }}
Prompt caching — reduce costs on large system prompts
const response = await client.messages.create({ model: "claude-sonnet-4-6", max_tokens: 1024, system: [ { type: "text", text: "You are an expert codebase assistant...\n\n[large context here]", cache_control: { type: "ephemeral" }, // cache this block }, ], messages: [{ role: "user", content: "Explain the auth module." }],});// Subsequent calls with same system block → cache hit → ~90% cheaper
Claude Code — useful workflows
# One-shot: explain code without starting REPLclaude -p "Explain what this does" < src/utils/parser.ts# Pipe output from another commandgit diff | claude -p "Summarize what changed in plain English"# Use in scriptsSUMMARY=$(claude -p "Summarize this log" < app.log)echo "$SUMMARY"# Ask Claude to write testsclaude "Write unit tests for src/auth/login.ts using Vitest"# Ask Claude to fix a failing testclaude "The test in auth.test.ts is failing — fix it"# Review a PRclaude /review-pr 42
CLAUDE.md — project instructions
Create CLAUDE.md in your repo root. Claude Code reads it on every session:
# Project: My App## Stack- Node.js 20, TypeScript, Fastify, PostgreSQL- Tests: Vitest, run with `pnpm test`- Lint: `pnpm lint` (ESLint + Prettier)## Conventions- Use named exports, no default exports- Prefer `async/await` over `.then()`- All DB queries go in `src/db/queries/`## Commands- `pnpm dev` — start dev server- `pnpm build` — production build- `pnpm test` — run tests
Batch API — process many prompts at once
// Create a batch (async, ~1hr processing)const batch = await client.messages.batches.create({ requests: [ { custom_id: "req-1", params: { model: "claude-haiku-4-5-20251001", max_tokens: 256, messages: [{ role: "user", content: "Translate: Hello world" }], }, }, // ... up to 10,000 requests ],});// Poll for completionconst result = await client.messages.batches.retrieve(batch.id);console.log(result.processing_status); // "ended" when done
Batch API is ~50% cheaper than individual calls. Good for bulk classification, data extraction, report generation.
Environment setup
# Install SDKnpm install @anthropic-ai/sdk# Set API keyexport ANTHROPIC_API_KEY=sk-ant-...# Or use .envecho "ANTHROPIC_API_KEY=sk-ant-..." >> .env