- Models:
claude-opus-4-6(200K context, best quality),claude-sonnet-4-6(200K context, balanced),claude-haiku-4-5(200K context, fast) - Use
cache_control: {type: "ephemeral"}to cache large system prompts (90% cheaper) - MCP (Model Context Protocol):
claude mcp add name urlfor external tools - Batch API: 50% cheaper for bulk processing up to 10,000 requests
- Claude Code CLI:
claude /compactto save context space
Quick reference tables
Models
| Model ID | Context | Best for |
|---|---|---|
| claude-opus-4-6 | 200K tokens | Complex reasoning, research, long documents |
| claude-sonnet-4-6 | 200K tokens | Balanced speed + quality (default choice) |
| claude-haiku-4-5-20251001 | 200K tokens | Fast, lightweight, high-volume tasks |
Anthropic API — core requests
| Task | What to use |
|---|---|
| Chat completion | POST /v1/messages |
| Streaming response | stream: true in request body |
| Count tokens | POST /v1/messages/count_tokens |
| List models | GET /v1/models |
| Create a batch | POST /v1/messages/batches |
Messages API — key parameters
| Parameter | Type | What it does |
|---|---|---|
| model | string | Which Claude model to use |
| max_tokens | int | Maximum tokens in response |
| messages | array | Conversation history [{role, content}] |
| system | string | System prompt |
| temperature | float 0–1 | Randomness (0 = deterministic) |
| top_p | float 0–1 | Nucleus sampling |
| top_k | int | Token sampling pool size |
| stop_sequences | array | Strings that stop generation |
| tools | array | Tool/function definitions |
| tool_choice | object | Force tool use (auto, any, tool) |
| stream | bool | Stream tokens as they generate |
Claude Code CLI — essential commands
| Command | What it does |
|---|---|
| claude | Start interactive REPL |
| claude "fix this bug" | One-shot prompt, no REPL |
| claude -p "prompt" | Non-interactive, print output |
| claude --model claude-opus-4-6 | Use a specific model |
| claude --no-stream | Disable streaming |
| claude /help | Show available slash commands |
| claude /clear | Clear conversation history |
| claude /compact | Compact context to save tokens |
| claude /commit | Auto-generate and create git commit |
| claude /review-pr 123 | Review a pull request |
| claude /cost | Show token usage and cost for session |
| claude /doctor | Check Claude Code health |
| claude /init | Create a CLAUDE.md for this repo |
Claude Code CLI — flags
| Flag | What it does |
|---|---|
| --model | Specify model ID |
| --api-key | Pass API key directly |
| --max-tokens | Override max output tokens |
| --add-dir /path | Add directory to working context |
| --print / -p | Print output without REPL |
| --output-format json | JSON output (for scripting) |
| --output-format stream-json | Streaming JSON output |
| --verbose | Show full tool call details |
| --no-stream | Wait for full response |
| --dangerously-skip-permissions | Skip tool permission prompts |
MCP — Model Context Protocol
| Command / Concept | What it does |
|---|---|
| claude mcp add name url | Add an MCP server by URL |
| claude mcp add name -- cmd args | Add a local MCP server via stdio |
| claude mcp list | List configured MCP servers |
| claude mcp remove name | Remove an MCP server |
| MCP scope local | Available in current project only |
| MCP scope user | Available across all projects |
| MCP scope project | Shared via .mcp.json in repo |
| CLAUDE.md | Project instructions Claude reads on start |
Token limits (approx.)
| Model | Input limit | Output limit | |---|---|---| | Opus 4.6 | 200K tokens | 32K tokens | | Sonnet 4.6 | 200K tokens | 64K tokens | | Haiku 4.5 | 200K tokens | 8K tokens |
Prompt caching
| Feature | What it does |
|---|---|
| cache_control: {type: "ephemeral"} | Cache a content block (5-min TTL) |
| Cache hit | ~90% cheaper, ~85% faster than full prompt |
| Minimum cacheable size | 1024 tokens (Opus/Sonnet), 2048 (Haiku) |
| Cached blocks | Tools, system prompt, messages |
Detailed sections
Basic API call (Node.js)
import Anthropic from "@anthropic-ai/sdk";
const client = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY });
const message = await client.messages.create({
model: "claude-sonnet-4-6",
max_tokens: 1024,
messages: [{ role: "user", content: "Explain async/await in JavaScript." }],
});
console.log(message.content[0].text); Streaming response
const stream = await client.messages.create({
model: "claude-sonnet-4-6",
max_tokens: 1024,
stream: true,
messages: [{ role: "user", content: "Write a short story." }],
});
for await (const event of stream) {
if (event.type === "content_block_delta") {
process.stdout.write(event.delta.text);
}
} System prompt + multi-turn conversation
const response = await client.messages.create({
model: "claude-sonnet-4-6",
max_tokens: 2048,
system: "You are a senior backend engineer. Be concise and precise.",
messages: [
{ role: "user", content: "What's wrong with N+1 queries?" },
{ role: "assistant", content: "N+1 queries happen when..." },
{ role: "user", content: "How do I fix it in PostgreSQL?" },
],
}); Tool use (function calling)
const response = await client.messages.create({
model: "claude-sonnet-4-6",
max_tokens: 1024,
tools: [
{
name: "get_weather",
description: "Get current weather for a city",
input_schema: {
type: "object",
properties: {
city: { type: "string", description: "City name" },
},
required: ["city"],
},
},
],
messages: [{ role: "user", content: "What's the weather in Tokyo?" }],
});
// Check if Claude wants to use a tool
if (response.stop_reason === "tool_use") {
const toolUse = response.content.find((b) => b.type === "tool_use");
console.log(toolUse.name, toolUse.input); // get_weather { city: 'Tokyo' }
} Prompt caching — reduce costs on large system prompts
const response = await client.messages.create({
model: "claude-sonnet-4-6",
max_tokens: 1024,
system: [
{
type: "text",
text: "You are an expert codebase assistant...\n\n[large context here]",
cache_control: { type: "ephemeral" }, // cache this block
},
],
messages: [{ role: "user", content: "Explain the auth module." }],
});
// Subsequent calls with same system block → cache hit → ~90% cheaper Claude Code — useful workflows
# One-shot: explain code without starting REPL
claude -p "Explain what this does" < src/utils/parser.ts
# Pipe output from another command
git diff | claude -p "Summarize what changed in plain English"
# Use in scripts
SUMMARY=$(claude -p "Summarize this log" < app.log)
echo "$SUMMARY"
# Ask Claude to write tests
claude "Write unit tests for src/auth/login.ts using Vitest"
# Ask Claude to fix a failing test
claude "The test in auth.test.ts is failing — fix it"
# Review a PR
claude /review-pr 42 CLAUDE.md — project instructions
Create CLAUDE.md in your repo root. Claude Code reads it on every session:
# Project: My App
## Stack
- Node.js 20, TypeScript, Fastify, PostgreSQL
- Tests: Vitest, run with `pnpm test`
- Lint: `pnpm lint` (ESLint + Prettier)
## Conventions
- Use named exports, no default exports
- Prefer `async/await` over `.then()`
- All DB queries go in `src/db/queries/`
## Commands
- `pnpm dev` — start dev server
- `pnpm build` — production build
- `pnpm test` — run tests Batch API — process many prompts at once
// Create a batch (async, ~1hr processing)
const batch = await client.messages.batches.create({
requests: [
{
custom_id: "req-1",
params: {
model: "claude-haiku-4-5-20251001",
max_tokens: 256,
messages: [{ role: "user", content: "Translate: Hello world" }],
},
},
// ... up to 10,000 requests
],
});
// Poll for completion
const result = await client.messages.batches.retrieve(batch.id);
console.log(result.processing_status); // "ended" when done Batch API is ~50% cheaper than individual calls. Good for bulk classification, data extraction, report generation.
Environment setup
# Install SDK
npm install @anthropic-ai/sdk
# Set API key
export ANTHROPIC_API_KEY=sk-ant-...
# Or use .env
echo "ANTHROPIC_API_KEY=sk-ant-..." >> .env # Python SDK
pip install anthropic
import anthropic
client = anthropic.Anthropic() # reads ANTHROPIC_API_KEY from env
message = client.messages.create( | [Claude Code Cheatsheet](/blog/cheatsheets/claude-code-cheatsheet/) | [Gemma 4 Local Setup](/blog/ai/tooling/gemma4-local-ollama/)
model="claude-sonnet-4-6",
max_tokens=1024,
messages=[{"role": "user", "content": "Hello!"}]
)
print(message.content[0].text) Wondering how Claude stacks up? Read Claude vs Gemini 2.5 for Coding: Honest Comparison.
Related Articles
Deepen your understanding with these curated continuations.
OpenAI API Cheat Sheet: GPT-4o, Tools & Assistants
Master the OpenAI API with this guide to GPT-4o, function calling, structured outputs, and Assistants. Includes DALL-E 3, Whisper, and embedding examples.
Gemini API Cheat Sheet: 2.5 Pro, Vision & Tools
Master Google Gemini API for 2.5 Pro and Flash models. Guide to vision, JSON output, function calling, Search grounding, and the Gemini CLI tool.
Ollama Cheat Sheet: Local LLMs, Models, API & Integration (2026)
Complete Ollama reference — pull and run local LLMs, API endpoints, Python/JS integration, multimodal models, model management, and GPU setup in 2026.