How to Run Google's Gemma 4 Locally on Your Phone

Google AI Edge Gallery is Google’s official app for running Gemma 4 entirely on your phone — no cloud, no API key, no data leaving your device. Download the model once over Wi-Fi, and after that it works completely offline. The E2B and E4B variants are built specifically for mobile hardware: small enough to fit on a phone, smart enough to actually be useful. Setup takes under five minutes.

What is Google AI Edge Gallery?

Google AI Edge Gallery runs AI models locally using your phone’s neural processing unit (NPU) or GPU. It’s not a wrapper around a cloud API. The model lives on your device. Inference happens on-device. Nothing is sent to a server.

It ships with four features:

AI Chat — a conversational interface backed by whichever local model you’ve downloaded
Agent Skills — turns your LLM from a question-answerer into something that can take actions
Ask Image — multimodal vision: point your camera at something and ask questions about it
Audio Scribe — transcribes and translates voice recordings in real time

Which Gemma 4 variant should I pick?

Model	Download size	RAM needed	Best for
Gemma 4 E2B	~2.6 GB	4 GB	Older Android phones, budget devices
Gemma 4 E4B	~4.7 GB	6 GB+	Mid-range and flagship phones

E4B is noticeably sharper — better reasoning, longer responses, stronger at following instructions. If your phone has 6GB+ RAM (most phones from 2022 onwards), pick E4B. Don’t just grab the bigger one because it sounds better. E2B on a phone with 4GB RAM will run smoothly. E4B on the same phone will stutter, swap memory constantly, and feel worse than a lesser model on adequate hardware. The heaviest model isn’t always the right call.

Both variants support a 128K context window and handle text, images, and audio.

The Scenario: You’re on a long flight, no Wi-Fi, and you saved a 40-page contract PDF to read later. You need a summary and you need to flag the clauses that look off. Gemma 4 E4B reads it, summarises it, and answers your follow-up questions — entirely on your phone, at 35,000 feet.

How do I install Google AI Edge Gallery?

Android — Google Play Store:

Search for Google AI Edge Gallery and install it. Requires Android 10 or later.

Android — APK (no Play Store):

Download the latest APK from the Google AI Edge Gallery releases on GitHub. Go to Settings → Apps → Install unknown apps, enable it for your file manager, then open the APK.

The Scenario: You’re on a custom Android ROM with no Play Store, or you’re in a region where the app isn’t listed yet. The GitHub APK route is the same app — it’s just a direct download instead of going through the store.

iOS — App Store:

Search for Google AI Edge Gallery. Requires iOS 16 or later.

How do I download Gemma 4 in the app?

Open the app and tap AI Chat. You’ll see a model list. Tap Download next to Gemma 4 E2B or E4B based on your device RAM. The download is 2.6–4.7GB — do it on Wi-Fi. Once downloaded, the model lives on your device storage and runs offline from that point.

The Scenario: You’re on a capped mobile plan. Download at home over Wi-Fi Sunday night. By Monday morning you’ve got a fully offline AI assistant that doesn’t eat your data budget every time you use it.

Can I use my own models?

Yes. Tap the + button at the bottom of the model list to import any compatible model file from your device storage. It works with other Gemma variants, MediaPipe-compatible models, and LiteRT-compatible checkpoints. If you’ve exported or downloaded a .task or .bin file, you can load it here.

What can I do with AI Chat?

Once the model downloads, tap AI Chat to open a conversation. No internet required. Ask it anything — code review, writing help, explaining a concept, summarising text you paste in. The 128K context window means you can paste in a lot before it starts forgetting the beginning of the conversation.

The Scenario: You’re in a meeting and someone asks you to explain a technical concept you half-remember. You pull out your phone under the table, ask Gemma 4 in two sentences, read the response, and answer confidently. Nobody knows. You look prepared.

What does Agent Skills do?

Agent Skills upgrades the model from a conversationalist into something that takes actions. Instead of just telling you how to do something, it can trigger on-device tasks, chain steps together, and work through structured workflows. It’s agentic mode — the model reads context and acts on it.

It’s genuinely useful for things like: drafting and sending a message based on a template, processing a list of items one by one, or running a multi-step task where each result feeds the next prompt.

The Scenario: You need to triage 30 customer support messages saved in a notes file. Instead of reading each one yourself, you paste the list into Agent Skills, tell it to categorise each as urgent/normal/ignore and draft a one-line response for the urgent ones. It works through the list. You review and send.

What does Ask Image do?

Tap Ask Image, then either point your camera at something or pick a photo from your gallery. Ask Gemma 4 anything about it. It runs vision inference entirely on-device — your photos never leave your phone.

Useful for: identifying objects, reading text in images, describing a scene, solving visual problems, getting ingredient lists from food packaging, or asking what that connector is called.

The Scenario: You’re at a second-hand electronics market and you find a board with a connector you don’t recognise. Point your camera at it, ask what it is and what cable it needs. Gemma 4 tells you on the spot. You either buy it or walk away — either way, no Google search, no signal needed.

What does Audio Scribe do?

Tap Audio Scribe and record a voice clip or import one from your storage. Gemma 4 transcribes it to text in real time and can translate it to another language. All processed locally — your recordings don’t go anywhere.

Useful for: meeting notes, long voice memos, multilingual transcription, converting a recorded lecture to text.

The Scenario: You recorded a 25-minute interview on your phone. You don’t want to pay a transcription service and you don’t want to upload a private conversation to some cloud API. Audio Scribe turns it into clean, readable text in a couple of minutes, no upload required.

Any tips before I start?

Clear storage before downloading. E4B needs ~5GB of free space plus headroom for the app to operate. If your phone is full, the download will fail partway through and you’ll have to start over.

The Scenario: You start the download at 11pm, fall asleep, and wake up to an error because your phone ran out of space halfway through. Clear your camera roll backups first. Saves the frustration.

Keep the screen on during the download. Android’s aggressive battery optimisation kills background processes. Stay in the app while it downloads or your phone might pause it and never resume.

Expect battery drain during heavy use. On-device inference is compute-intensive. A long AI Chat session will drain your battery noticeably faster than normal use. Plug in if you’re doing anything serious.

iPhone users with A17 Pro or later get faster inference. The dedicated Neural Engine on Apple Silicon chips runs inference faster than most Android equivalents at the same model size. If you’re comparing speeds between devices, chip architecture matters more than clock speed or RAM.

System_Continuity

Next_Recommended_Node

Gemma 4 on Edge Devices: Android, Raspberry Pi, and IoT Applications

Deploy Gemma 4 on edge devices — Android phones, Raspberry Pi 5, NVIDIA Jetson, and IoT. Vision, audio, and agentic AI that runs completely offline.

Jena

5m read

OpenClaw 5m

Run Gemma 4 Locally with OpenClaw

Log_Access

Claude Code 5m

How to Use Gemma 4 with Claude Code via Ollama (April 2026)

Log_Access

Browse the full manifest