Run Your Own AI: A Practical Self-Hosting Starter Guide

PewDiePie's Odysseus, Nous Research's Hermes Agent, OpenClaw, and the rest of the local stack explained. What to run, what hardware you need, and why owning your model is the AI version of holding your own keys.

Something shifted in the last few months. Self-hosted AI went from a hobbyist niche to a movement with real momentum: a YouTuber with 110 million subscribers shipped a full self-hosted AI workspace, an open-source agent from a crypto-native lab claimed the top spot on a major global usage ranking, and the tooling got good enough that you no longer need to be a systems engineer to participate.

If you have been waiting for the moment when running your own AI became practical, this is it. Here is the landscape and how to get started.

Why Bother? The Self-Custody Argument

The reasons to self-host AI map almost one-to-one onto the reasons to self-custody Bitcoin.

Privacy: a local model never ships your prompts, documents, or email to someone else's server. No telemetry, no training on your data, no subpoena surface. Control: nobody can deprecate your model, change its personality, paywall a feature you depend on, or ban your account. Cost: open models are free, and the marginal cost of a query on your own hardware is electricity. And resilience: your assistant works when the API is down, rate-limited, or geofenced.

Not your weights, not your AI. The cloud frontier models are still more capable at the high end, the same way a bank vault holds more than a home safe. But the gap narrows every quarter, and for a huge share of everyday work, local is already enough.

The Workspaces: Odysseus

The loudest arrival is Odysseus, the free, open-source AI workspace Felix Kjellberg (PewDiePie) released on May 31 after a year of publicly documented building. Think of it as the front-end for your local AI life: multi-model chat, autonomous agents with tool and MCP support, deep research, an email assistant, side-by-side model comparison, and persistent memory, all in one interface with no telemetry.

Its killer feature for beginners is the built-in Cookbook, which detects your hardware and gives one-click serving recommendations across what the project catalogues as 270+ open models. It speaks vLLM, llama.cpp, Ollama, and OpenRouter, so it sits cleanly on top of whatever engine you choose. Grab it from the GitHub repo, point it at your hardware, and it will tell you what you can realistically run.

The Agents: Hermes Agent and OpenClaw

A workspace answers when you talk to it. An agent keeps working when you walk away. Two projects dominate this category in 2026.

Hermes Agent is the current leader. Built by Nous Research, it is a self-improving agent with a built-in learning loop: it creates skills from experience, refines them with use, and builds a persistent model of you across sessions. It is deliberately infrastructure-agnostic, happy on a $5 VPS, a GPU cluster, or serverless compute, and you can message it from Telegram while it works on a machine elsewhere. As of May 10 it overtook OpenClaw for the #1 spot on OpenRouter's global agent rankings, and reviewers consistently highlight its seven-layer security model, designed up front to counter the attack classes that bit the agent ecosystem early on.

OpenClaw is the project that created this category: the original viral self-hosted assistant, with a huge community and plugin ecosystem. It remains excellent, though its early phase was marked by security incidents that the ecosystem learned hard lessons from, and several reviewers now describe Hermes as the one that delivers on OpenClaw's promise. If you are already invested in OpenClaw, switching costs are low: Hermes ships a migration wizard that detects your OpenClaw install and imports settings, memories, and skills automatically. We profiled the broader agent economy, including agents that hold their own wallets, in when AI meets crypto.

The Engines and the Hardware

Under every workspace and agent sits an inference engine. You only need one to start. Ollama is the easiest on-ramp: one install, one command, model running. LM Studio offers a friendly GUI with a built-in model browser. llama.cpp is the lean, close-to-the-metal option everything else builds on. vLLM is what you graduate to when you want to serve models fast for multiple users or agents.

On hardware, the honest rules of thumb: a modern laptop with 16GB of RAM runs small models (7 to 8B parameters) well enough for chat, summarization, and basic agent work. A gaming PC with a 12 to 24GB GPU comfortably runs the mid-size open models that handle most daily tasks. Apple Silicon Macs punch above their weight thanks to unified memory. The large frontier-class open models want serious VRAM or multi-GPU rigs, and that is exactly the tier where Odysseus's hardware-aware Cookbook earns its keep, so let it match the model to your machine rather than guessing.

A note from one infrastructure operator to another: inference is power. The same economics that govern our mining racks govern your AI rig, and cheap electricity is the quiet subsidy behind every local token. It is the consumer-scale version of the story we told in AI is burning through power, and Bitcoin miners already know how to handle it.

A Sane Starting Path

Start small and climb. First, install Ollama or LM Studio and run a small open model to get a feel for local inference. Second, add Odysseus on top as your workspace, and let its Cookbook recommend a model that fits your hardware. Third, when you want an assistant that persists and acts on your behalf, deploy Hermes Agent (or OpenClaw, if its ecosystem appeals to you) on a cheap VPS or a spare machine.

And carry your security instincts over from crypto, because agents raise the stakes: scope permissions tightly, treat anything an agent reads from the open internet as untrusted input, keep API keys and wallets out of reach of experimental setups, and update frequently. The agent ecosystem's early security incidents were the smart-contract-exploit era of this movement. Behave accordingly.

The Bottom Line

You do not have to pick a side between frontier cloud models and a local stack. Most people will sensibly use both. But there is something clarifying about a model that answers only to you, on hardware you own, with your data never leaving the room. The self-custody instinct built Bitcoin, and it is now coming for AI. The tooling is finally good enough that joining costs you a weekend, not a career.

Delegate to the ULTRA pool using Eternl or Lace, or learn more at ultra-labs.io.

Ultra Labs is a US Bitcoin mining and crypto infrastructure company powered by renewable energy and built on decentralized infrastructure. This article is for informational purposes only and is not financial, investment, legal, or tax advice. Ultra Labs publications are produced with the assistance of artificial intelligence and reviewed by humans, and may contain inaccuracies. Always do your own research before making investment decisions.