VIPSOn-device AI

Hi, I'm Vips!

On-device AI in your pocket. Models run locally in your browser with WebGPU — private and offline-capable. Or point me at your own LM Studio / Ollama endpoint.

Device

Checking WebGPU…

Probing on-device inference support.

On-device models

These download once into the browser cache, then run locally with no network. Smaller models load faster and use less RAM — start small on a phone.

Loading catalog…

Cloud & servers

Bring your own model — OpenAI, Claude, Gemini, GitHub/Copilot — or a local LM Studio / Ollama server.

🔒 Keys are stored only on this device (browser storage) and sent straight to the provider you choose. Cloud calls may need the provider to allow browser (CORS) access.

Inference

Temperature0.7
Lower = focused & deterministic · Higher = creative
Top-P0.95
Max tokens1024
Frequency penalty0.0
Presence penalty0.0

Voice

Speak responses
Read replies aloud in normal chat (talk-back)
Speech rate1.0

App

How it works

On-device: models run in your browser via WebGPU (WebLLM/MLC). In this mode nothing you type leaves your phone.
Remote: requests go only to the server URL you enter — your own LM Studio/Ollama or an API you provide a key for.
Storage: chats live in IndexedDB on this device; settings in localStorage; downloaded models in the browser cache.
Offline: once a model is cached and the page has loaded, on-device chat works without a connection. Full offline launch depends on your browser/hosting.
VIPS · single-file PWA · no telemetry
Feedback → v.vipinthomas@gmail.com

Choose model

Tap to speak
Talk-back on
Tap to talk
Stop

Edit

Run agent

VIPS
On-device AI · Agent studio

Your AI. Your device.
Your rules.

Run language models right on your phone with WebGPU — or bring Claude, OpenAI, Gemini or Copilot. Build agents, wire up tools and MCP, and orchestrate them.

Free · No account · No tracking · No ads
On-device · WebGPU Claude OpenAI Gemini GitHub/Copilot LM Studio Ollama
What's inside

A full AI workspace in one app

No backend, nothing to install. Open it, add it to your home screen, and you have a private AI studio.

Runs on your device

Open models run locally via WebGPU. In on-device mode, nothing you type leaves your phone.

Bring any provider

Add Claude, OpenAI, Gemini, GitHub/Copilot, LM Studio or Ollama. Switch per chat — or per agent.

Build & orchestrate agents

Give agents tools and skills, let them delegate to sub-agents, and watch the run stream live.

Tools, APIs & MCP

Connect real HTTP tools or MCP servers to give agents live capabilities. Start from templates.

Talk to it

Hands-free voice mode listens, answers and reads replies back aloud.

Private & yours

Keys stay in your browser. No telemetry, no ads. Export your whole setup and move it anywhere.

Get going

Up and running in minutes

1Pick a modelDownload an on-device model, or add a provider key in the Models tab.
2Chat or talkType or use voice. Switch models anytime from the header.
3Build an agentStart from a template in the Agents tab, give it tools, and run it.
4Install itAdd to your home screen for full-screen, offline-capable use.
Built to be trusted

Honest by design

No dark patterns, no overpromises — here's what to expect, good and limits.

On-device means privateLocal mode keeps prompts on your device. Cloud providers get only what you send, with a key only you hold.
No telemetry or adsVIPS doesn't phone home or track you. Your data lives in your browser's storage.
WebGPU for local modelsOn-device inference needs a WebGPU browser; start small on phones. No WebGPU? Use a provider instead.
Browser rules applyCloud and MCP calls work when the service allows browser (CORS) access; some need a proxy.
Single-file PWA · self-hostable on any static host
v.vipinthomas@gmail.com