On-device AI in your pocket. Models run locally in your browser with WebGPU — private and offline-capable. Or point me at your own LM Studio / Ollama endpoint.
Device
Checking WebGPU…
Probing on-device inference support.
On-device models
These download once into the browser cache, then run locally with no network. Smaller models load faster and use less RAM — start small on a phone.
Loading catalog…
Cloud & servers
Bring your own model — OpenAI, Claude, Gemini, GitHub/Copilot — or a local LM Studio / Ollama server.
🔒 Keys are stored only on this device (browser storage) and sent straight to the provider you choose. Cloud calls may need the provider to allow browser (CORS) access.
On-device: models run in your browser via WebGPU (WebLLM/MLC). In this mode nothing you type leaves your phone.
▸
Remote: requests go only to the server URL you enter — your own LM Studio/Ollama or an API you provide a key for.
▸
Storage: chats live in IndexedDB on this device; settings in localStorage; downloaded models in the browser cache.
▸
Offline: once a model is cached and the page has loaded, on-device chat works without a connection. Full offline launch depends on your browser/hosting.
Run language models right on your phone with WebGPU — or bring Claude, OpenAI, Gemini or Copilot. Build agents, wire up tools and MCP, and orchestrate them.