Claude API Cheat Sheet: SDK, CLI, MCP & Prompting

TL;DR

Models: claude-opus-4-6 (200K context, best quality), claude-sonnet-4-6 (200K context, balanced), claude-haiku-4-5 (200K context, fast)
Use cache_control: {type: "ephemeral"} to cache large system prompts (90% cheaper)
MCP (Model Context Protocol): claude mcp add name url for external tools
Batch API: 50% cheaper for bulk processing up to 10,000 requests
Claude Code CLI: claude /compact to save context space

Quick reference tables

Models

Model ID	Context	Best for
`claude-opus-4-6`	200K tokens	Complex reasoning, research, long documents
`claude-sonnet-4-6`	200K tokens	Balanced speed + quality (default choice)
`claude-haiku-4-5-20251001`	200K tokens	Fast, lightweight, high-volume tasks

Anthropic API — core requests

Task	What to use
Chat completion	`POST /v1/messages`
Streaming response	`stream: true` in request body
Count tokens	`POST /v1/messages/count_tokens`
List models	`GET /v1/models`
Create a batch	`POST /v1/messages/batches`

Messages API — key parameters

Parameter	Type	What it does
`model`	string	Which Claude model to use
`max_tokens`	int	Maximum tokens in response
`messages`	array	Conversation history `[{role, content}]`
`system`	string	System prompt
`temperature`	float 0–1	Randomness (0 = deterministic)
`top_p`	float 0–1	Nucleus sampling
`top_k`	int	Token sampling pool size
`stop_sequences`	array	Strings that stop generation
`tools`	array	Tool/function definitions
`tool_choice`	object	Force tool use (`auto`, `any`, `tool`)
`stream`	bool	Stream tokens as they generate

Claude Code CLI — essential commands

Command	What it does
`claude`	Start interactive REPL
`claude "fix this bug"`	One-shot prompt, no REPL
`claude -p "prompt"`	Non-interactive, print output
`claude --model claude-opus-4-6`	Use a specific model
`claude --no-stream`	Disable streaming
`claude /help`	Show available slash commands
`claude /clear`	Clear conversation history
`claude /compact`	Compact context to save tokens
`claude /commit`	Auto-generate and create git commit
`claude /review-pr 123`	Review a pull request
`claude /cost`	Show token usage and cost for session
`claude /doctor`	Check Claude Code health
`claude /init`	Create a CLAUDE.md for this repo

Claude Code CLI — flags

Flag	What it does
`--model`	Specify model ID
`--api-key`	Pass API key directly
`--max-tokens`	Override max output tokens
`--add-dir /path`	Add directory to working context
`--print` / `-p`	Print output without REPL
`--output-format json`	JSON output (for scripting)
`--output-format stream-json`	Streaming JSON output
`--verbose`	Show full tool call details
`--no-stream`	Wait for full response
`--dangerously-skip-permissions`	Skip tool permission prompts

MCP — Model Context Protocol

Command / Concept	What it does
`claude mcp add name url`	Add an MCP server by URL
`claude mcp add name -- cmd args`	Add a local MCP server via stdio
`claude mcp list`	List configured MCP servers
`claude mcp remove name`	Remove an MCP server
MCP scope `local`	Available in current project only
MCP scope `user`	Available across all projects
MCP scope `project`	Shared via `.mcp.json` in repo
`CLAUDE.md`	Project instructions Claude reads on start

Token limits (approx.)

Model	Input limit	Output limit
Opus 4.6	200K tokens	32K tokens
Sonnet 4.6	200K tokens	64K tokens
Haiku 4.5	200K tokens	8K tokens

Prompt caching

Feature	What it does
`cache_control: {type: "ephemeral"}`	Cache a content block (5-min TTL)
Cache hit	~90% cheaper, ~85% faster than full prompt
Minimum cacheable size	1024 tokens (Opus/Sonnet), 2048 (Haiku)
Cached blocks	Tools, system prompt, messages

Detailed sections

Basic API call (Node.js)

javascript

import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY });

const message = await client.messages.create({
  model: "claude-sonnet-4-6",
  max_tokens: 1024,
  messages: [{ role: "user", content: "Explain async/await in JavaScript." }],
});

console.log(message.content[0].text);

Streaming response

javascript

const stream = await client.messages.create({
  model: "claude-sonnet-4-6",
  max_tokens: 1024,
  stream: true,
  messages: [{ role: "user", content: "Write a short story." }],
});

for await (const event of stream) {
  if (event.type === "content_block_delta") {
    process.stdout.write(event.delta.text);
  }
}

System prompt + multi-turn conversation

javascript

const response = await client.messages.create({
  model: "claude-sonnet-4-6",
  max_tokens: 2048,
  system: "You are a senior backend engineer. Be concise and precise.",
  messages: [
    { role: "user", content: "What's wrong with N+1 queries?" },
    { role: "assistant", content: "N+1 queries happen when..." },
    { role: "user", content: "How do I fix it in PostgreSQL?" },
  ],
});

Tool use (function calling)

javascript

const response = await client.messages.create({
  model: "claude-sonnet-4-6",
  max_tokens: 1024,
  tools: [
    {
      name: "get_weather",
      description: "Get current weather for a city",
      input_schema: {
        type: "object",
        properties: {
          city: { type: "string", description: "City name" },
        },
        required: ["city"],
      },
    },
  ],
  messages: [{ role: "user", content: "What's the weather in Tokyo?" }],
});

// Check if Claude wants to use a tool
if (response.stop_reason === "tool_use") {
  const toolUse = response.content.find((b) => b.type === "tool_use");
  console.log(toolUse.name, toolUse.input); // get_weather { city: 'Tokyo' }
}

Prompt caching — reduce costs on large system prompts

javascript

const response = await client.messages.create({
  model: "claude-sonnet-4-6",
  max_tokens: 1024,
  system: [
    {
      type: "text",
      text: "You are an expert codebase assistant...\n\n[large context here]",
      cache_control: { type: "ephemeral" }, // cache this block
    },
  ],
  messages: [{ role: "user", content: "Explain the auth module." }],
});
// Subsequent calls with same system block → cache hit → ~90% cheaper

Claude Code — useful workflows

bash

# One-shot: explain code without starting REPL
claude -p "Explain what this does" < src/utils/parser.ts

# Pipe output from another command
git diff | claude -p "Summarize what changed in plain English"

# Use in scripts
SUMMARY=$(claude -p "Summarize this log" < app.log)
echo "$SUMMARY"

# Ask Claude to write tests
claude "Write unit tests for src/auth/login.ts using Vitest"

# Ask Claude to fix a failing test
claude "The test in auth.test.ts is failing — fix it"

# Review a PR
claude /review-pr 42

CLAUDE.md — project instructions

Create CLAUDE.md in your repo root. Claude Code reads it on every session:

markdown

# Project: My App

## Stack
- Node.js 20, TypeScript, Fastify, PostgreSQL
- Tests: Vitest, run with `pnpm test`
- Lint: `pnpm lint` (ESLint + Prettier)

## Conventions
- Use named exports, no default exports
- Prefer `async/await` over `.then()`
- All DB queries go in `src/db/queries/`

## Commands
- `pnpm dev` — start dev server
- `pnpm build` — production build
- `pnpm test` — run tests

Batch API — process many prompts at once

javascript

// Create a batch (async, ~1hr processing)
const batch = await client.messages.batches.create({
  requests: [
    {
      custom_id: "req-1",
      params: {
        model: "claude-haiku-4-5-20251001",
        max_tokens: 256,
        messages: [{ role: "user", content: "Translate: Hello world" }],
      },
    },
    // ... up to 10,000 requests
  ],
});

// Poll for completion
const result = await client.messages.batches.retrieve(batch.id);
console.log(result.processing_status); // "ended" when done

Batch API is ~50% cheaper than individual calls. Good for bulk classification, data extraction, report generation.

Environment setup

bash

# Install SDK
npm install @anthropic-ai/sdk

# Set API key
export ANTHROPIC_API_KEY=sk-ant-...

# Or use .env
echo "ANTHROPIC_API_KEY=sk-ant-..." >> .env

python

# Python SDK
pip install anthropic

import anthropic
client = anthropic.Anthropic()  # reads ANTHROPIC_API_KEY from env

message = client.messages.create( | [Claude Code Cheatsheet](/blog/cheatsheets/claude-code-cheatsheet/) | [Gemma 4 Local Setup](/blog/ai/tooling/gemma4-local-ollama/)
    model="claude-sonnet-4-6",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello!"}]
)
print(message.content[0].text)

Wondering how Claude stacks up? Read Claude vs Gemini 2.5 for Coding: Honest Comparison.

Deepen your understanding with these curated continuations.

View All Articles

Cheatsheet5 min read

OpenAI API Cheat Sheet: GPT-4o, Tools & Assistants

Master the OpenAI API with this guide to GPT-4o, function calling, structured outputs, and Assistants. Includes DALL-E 3, Whisper, and embedding examples.

VishnuDec 27, 2025

Cheatsheet5 min read

Gemini API Cheat Sheet: 2.5 Pro, Vision & Tools

Master Google Gemini API for 2.5 Pro and Flash models. Guide to vision, JSON output, function calling, Search grounding, and the Gemini CLI tool.

VishnuDec 26, 2025

Cheatsheet5 min read

Ollama Cheat Sheet: Local LLMs, Models, API & Integration (2026)

Complete Ollama reference — pull and run local LLMs, API endpoints, Python/JS integration, multimodal models, model management, and GPU setup in 2026.

Darsh JariwalaMay 19, 2026

Claude API Cheat Sheet: SDK, CLI, MCP & Prompting

Quick reference tables

Models

Anthropic API — core requests

Messages API — key parameters

Claude Code CLI — essential commands

Claude Code CLI — flags

MCP — Model Context Protocol

Token limits (approx.)

Prompt caching

Detailed sections

Basic API call (Node.js)

Streaming response

System prompt + multi-turn conversation

Tool use (function calling)

Prompt caching — reduce costs on large system prompts

Claude Code — useful workflows

CLAUDE.md — project instructions

Batch API — process many prompts at once

Environment setup

Related Articles

OpenAI API Cheat Sheet: GPT-4o, Tools & Assistants

Gemini API Cheat Sheet: 2.5 Pro, Vision & Tools

Ollama Cheat Sheet: Local LLMs, Models, API & Integration (2026)

Related Articles

MCP vs Function Calling: What's the Actual Difference?

Gemini API Cheat Sheet: 2.5 Pro, Vision & Tools

OpenAI API Cheat Sheet: GPT-4o, Tools & Assistants

Quick reference tables

Models

Anthropic API — core requests

Messages API — key parameters

Claude Code CLI — essential commands

Claude Code CLI — flags

MCP — Model Context Protocol

Token limits (approx.)

Prompt caching

Detailed sections

Basic API call (Node.js)

Streaming response

System prompt + multi-turn conversation

Tool use (function calling)

Prompt caching — reduce costs on large system prompts

Claude Code — useful workflows

CLAUDE.md — project instructions

Batch API — process many prompts at once

Environment setup

Related Articles

OpenAI API Cheat Sheet: GPT-4o, Tools & Assistants

Gemini API Cheat Sheet: 2.5 Pro, Vision & Tools

Ollama Cheat Sheet: Local LLMs, Models, API & Integration (2026)

Related Articles

MCP vs Function Calling: What's the Actual Difference?

Gemini API Cheat Sheet: 2.5 Pro, Vision & Tools

OpenAI API Cheat Sheet: GPT-4o, Tools & Assistants

Before you go...