Xrouter
Xrouter is an open-source inference router that sits between OpenClaw and your LLM providers. It uses a fast, hardware-aware classifier to route each request to the most cost-effective model that can handle the task.
This project is MIT licensed. See the MIT License.
Core Features
OpenAI-compatible reverse proxy at
POST /v1/chat/completions.3-tier classifier (0 = cheap, 1 = medium, 2 = frontier) with early stream cutoff.
Hardware detection helper to recommend local engine.
Provider selection wizard to choose local and cloud endpoints.
Cache layer with Redis or in-memory LRU fallback.
Full cloud mode when local inference is not viable.
Token tracking dashboard at
/dashboard.
Workflow
flowchart TD
A["Client / OpenClaw request"] --> B["Router (OpenAI-compatible)"]
B --> C{"Classifier enabled?"}
C -->|No| F["Route to Frontier provider"]
C -->|Yes| D["Classifier (0 / 1 / 2)"]
D --> E{"Decision"}
E -->|0| G["Route to Cheap provider"]
E -->|1| M["Route to Medium provider"]
E -->|2| F
G --> H["Provider adapter (auto or explicit)"]
M --> H
F --> H
H --> I["Upstream API call"]
I --> J["Stream/Response back to client"]
Repository Layout
src/server.js: router and streaming proxy.src/classifier.js: classifier call and retry logic.src/config.js: configuration and env parsing.src/cache.js: Redis + LRU cache.src/token_tracker.js: token tracking.scripts/check_hw.js: hardware detection.scripts/configure_providers.js: interactive provider setup.
Requirements
Node.js 20+.
Local classifier engine (optional).
A frontier provider endpoint (required).
Quickstart
Install dependencies.
(Optional) Start a local model server.
Run the configuration wizard.
Start the router.
npm install
npm run configure
npm run dev
How To Use
Start your local model server (optional but recommended).
Run the wizard to configure providers and models.
Start the router.
Send OpenAI-compatible requests to the router.
Inspect routing decisions in response headers or the dashboard.
Example local setup (Ollama):
ollama pull llama3.1
ollama run llama3.1
Run the wizard:
npm run configure
Start the router:
npm run dev
Test a request:
curl -i http://localhost:3000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model":"any","messages":[{"role":"user","content":"Fix this sentence: I has a apple."}]}'
Look for these headers:
X-Xrouter-decision:0,1, or2.X-Xrouter-upstream:cheap,medium, orfrontier.
Open the dashboard:
http://localhost:3000/dashboard
Raw usage JSON:
http://localhost:3000/usage
Provider Selection (Terminal Wizard) Run:
npm run configure
The wizard:
Scans hardware and recommends a local engine.
Suggests a local classifier model.
Lets you choose provider base URLs, API keys, and model overrides for cheap/medium/frontier routes.
Writes
upstreams.jsonand optionally updates.env.
Quick Start Mode
If your machine can run a local model, you can choose Quick Start.
Quick Start auto-configures the local classifier.
Cheap route always uses the same local model as the classifier to avoid Ollama model swapping.
You only need to choose medium and frontier providers/models.
On Apple Silicon (Ollama), the wizard lists installed Ollama models and can auto-download a recommended model.
Routing Behavior
The classifier is called for each uncached request.
The first
0,1, or2token returned decides the route.If classification fails, the router defaults to the frontier route.
When the classifier is enabled, cheap, medium, and frontier routes must be configured.
Compatibility
The router accepts OpenAI-style requests and translates when needed.
Provider type can be explicit (
xrouter,openai_compatible,openai,anthropic,gemini,cohere,azure_openai,mistral,groq,together,perplexity) orauto.autoinfers the provider adapter from the base URL or API key.Providers that expose OpenAI-compatible endpoints use the
openai_compatibleadapter.Anthropic/Gemini/Cohere streaming is translated into OpenAI-style SSE chunks.
Non-OpenAI adapters currently support text-only messages and basic sampling params (temperature/top_p/stop).
Token Tracking Dashboard
GET /usage: returns cumulative token usage forcheap,medium, andfrontier.GET /dashboard: UI that displays token split and totals.Local usage is counted inside
cheapwhen cheap uses the local model.
Environment Summary
HOST: bind host, default0.0.0.0.PORT: bind port, default3000.ROUTER_API_KEY: requireAuthorization: Bearer <key>.LOG_LEVEL: log level (debug/info/warn/error).LOG_TO_FILE: settrueto write logs to files.LOG_DIR: directory for log files (default./logs).CLASSIFIER_ENABLED: setfalseto disable local classification.CLASSIFIER_BASE_URL: OpenAI-compatible classifier endpoint.CLASSIFIER_MODEL: classifier model name.CLASSIFIER_SYSTEM_PROMPT: classifier prompt (single line).CLASSIFIER_TIMEOUT_MS: classifier timeout.CLASSIFIER_FORCE_STREAM: force streaming classifier request.CLASSIFIER_WARMUP: warm the classifier on server start.CLASSIFIER_WARMUP_DELAY_MS: delay before warmup request (ms).CLASSIFIER_KEEP_ALIVE_MS: keep-alive interval for classifier warmup (ms).CLASSIFIER_LOADING_RETRY_MS: delay between retries when the model is loading.CLASSIFIER_LOADING_MAX_RETRIES: max retries when the model is loading.CHEAP_BASE_URL: optional, defaults to classifier base URL.CHEAP_API_KEY: cheap provider API key.CHEAP_MODEL: optional model override for cheap route.CHEAP_PROVIDER: provider type for cheap route (autoif empty).CHEAP_HEADERS: optional JSON headers for cheap provider (stringified object).CHEAP_DEPLOYMENT: Azure deployment override for cheap route.CHEAP_API_VERSION: Azure API version override for cheap route.MEDIUM_BASE_URL: required when classifier is enabled.MEDIUM_API_KEY: medium provider API key.MEDIUM_MODEL: optional model override for medium route.MEDIUM_PROVIDER: provider type for medium route (autoif empty).MEDIUM_HEADERS: optional JSON headers for medium provider (stringified object).MEDIUM_DEPLOYMENT: Azure deployment override for medium route.MEDIUM_API_VERSION: Azure API version override for medium route.FRONTIER_BASE_URL: OpenAI-compatible frontier endpoint.FRONTIER_API_KEY: frontier API key.FRONTIER_MODEL: optional model override for frontier route.FRONTIER_PROVIDER: provider type for frontier route (autoif empty).FRONTIER_HEADERS: optional JSON headers for frontier provider (stringified object).FRONTIER_DEPLOYMENT: Azure deployment override for frontier route.FRONTIER_API_VERSION: Azure API version override for frontier route.REDIS_URL: if set, enables Redis cache.
Local Model Installation & Run Guides Ollama (best for Mac, easiest cross-platform)
Install: Ollama Quickstart
Pull a model:
ollama pull llama3.1Run:
ollama run llama3.1Base URL:
http://localhost:11434Router config:
CLASSIFIER_BASE_URL=http://localhost:11434CLASSIFIER_MODEL=llama3.1
vLLM (NVIDIA GPU)
OpenAI server: vLLM OpenAI Server
Example:
vllm serve NousResearch/Meta-Llama-3-8B-Instruct --dtype auto --api-key token-abc123Base URL:
http://localhost:8000Router config:
CLASSIFIER_BASE_URL=http://localhost:8000CLASSIFIER_MODEL=NousResearch/Meta-Llama-3-8B-Instruct
TensorRT-LLM (NVIDIA, max speed)
Repo: TensorRT-LLM
Server: trtllm-serve
Base URL:
http://<host>:<port>Router config:
CLASSIFIER_BASE_URL=http://<host>:<port>CLASSIFIER_MODEL=<your model>
llama.cpp (CPU/AMD fallback)
Repo: llama.cpp
Example:
llama-server -m model.gguf --port 8080Base URL:
http://localhost:8080Router config:
CLASSIFIER_BASE_URL=http://localhost:8080CLASSIFIER_MODEL=<gguf model name>
Docker Build and run the router with Redis:
docker compose -f deploy/docker-compose.yml up --build
Hardware Detection Run:
npm run check-hw
This prints the recommended engine:
tensorrt-llmfor large NVIDIA GPUs.vllmfor standard NVIDIA GPUs.mlxfor Apple Silicon.llama.cppfor CPU/AMD fallback.
Model List Fetching
The wizard queries provider model list endpoints when possible.
OpenAI-compatible:
/v1/modelsAnthropic:
/v1/modelsGemini:
/v1beta/modelsCohere:
/v1/modelsIf listing fails, the wizard falls back to
scripts/cloud_model_catalog.json.
Star History