The docs say "just set the API URL." They don't mention the CORS nightmare. They don't mention that the model name you type in LobeChat has nothing to do with the model name your API serves. They don't mention that the Docker one-liner works on localhost and breaks on every other machine in your house.
I spent two hours on a Sunday afternoon getting LobeChat to talk to my vLLM instance. Two hours for what should have been a five-minute configuration. The problem was not LobeChat — the problem was that every guide I found skipped the parts that actually break. This is the guide I wish I had.
My setup: LobeChat running in Docker on a home server, vLLM serving Qwen3-7B-Instruct on the same machine at port 8000. The principles here apply to GPUStack, Ollama, or any other OpenAI-compatible backend.
What Is LobeChat and Why I Use It
LobeChat is an open-source chat UI that looks and feels like ChatGPT but connects to any LLM backend you want — OpenAI, Anthropic, local APIs, or all of them at once. It supports plugins, multi-model conversations, file uploads, and a plugin ecosystem that includes web search and image generation. The interface is polished. The mobile view works. The conversation history syncs across devices if you configure PostgreSQL.
I use it because I got tired of switching between five different tools. When I need to test a prompt against Qwen3-7B, I open LobeChat. When I need to compare that output against GPT-4o, I switch models in the same thread. When I want to share a conversation with a colleague, I export it as Markdown. It is the closest thing to a universal chat client for LLMs, and it is free.
The catch: connecting it to a local model is not the "paste the URL and go" experience the README implies. The UI is smooth. The backend configuration is where the friction lives.
Starting Your Local API
Before you touch LobeChat, make sure your inference server is running and responding. I will show vLLM and GPUStack examples. If you use Ollama, the URL format is different but the CORS and model-name problems are identical.
vLLM:
python -m vllm.entrypoints.openai.api_server \
--model Qwen/Qwen3-7B-Instruct \
--port 8000 \
--host 0.0.0.0 \
--max-model-len 8192 \
--gpu-memory-utilization 0.85
The --host 0.0.0.0 is critical. vLLM defaults to 127.0.0.1, which means it only accepts connections from the same machine. If LobeChat is running in Docker, those requests come from the Docker bridge network, not localhost. Without 0.0.0.0, every connection times out and you will waste 20 minutes checking firewall rules that are not the problem.
GPUStack:
If you followed my GPUStack guide, your API is already running at http://your-server-ip/v1-openai/chat/completions. The model name is whatever you set during deployment — usually the Hugging Face model ID.
Test your API with curl before opening LobeChat:
curl http://192.168.1.50:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "Qwen/Qwen3-7B-Instruct",
"messages": [{"role": "user", "content": "Say hello"}]
}'
If this returns a valid JSON response, your API works. If it returns "connection refused," check the --host flag. If it hangs, check your firewall. Do not proceed to LobeChat until this curl succeeds.
Installing LobeChat
The Docker one-liner from the official docs:
docker run -d -p 3210:3210 \
--name lobechat \
lobehub/lobe-chat:latest
This works for a quick test. It does not persist your settings, conversations, or API keys across restarts. For a real setup, use Docker Compose with a volume mount:
version: '3.8'
services:
lobechat:
image: lobehub/lobe-chat:latest
ports:
- "3210:3210"
environment:
- ACCESS_CODE=your-admin-password
volumes:
- ./lobe-data:/app/.config/lobe-chat
restart: unless-stopped
The ACCESS_CODE environment variable sets a simple password for the UI. Without it, anyone on your network can access your chat interface and burn through your API credits or GPU time.
Start it with docker compose up -d and open http://your-server-ip:3210. You should see the LobeChat welcome screen with a "Get Started" button.
The Connection Settings
In the LobeChat interface, click your profile icon in the top-right corner and select "Settings." In the settings panel, look for the "Language Model" tab on the left sidebar. This is where you configure providers.
Click "OpenAI" in the provider list. Yes, OpenAI — even though you are connecting to a local server. LobeChat uses the OpenAI client SDK internally, so every OpenAI-compatible local API goes through this provider slot.
Fill in these exact fields:
- API Key:
sk-anything— vLLM and GPUStack ignore this field, but LobeChat requires a non-empty string. I usesk-localso I remember it is not a real key. - API Proxy URL:
http://192.168.1.50:8000/v1— this is the part that breaks. Nothttp://localhost:8000/v1. Nothttp://127.0.0.1:8000/v1. You need the actual IP address of the machine running vLLM, because LobeChat is inside a Docker container and localhost inside Docker points to the container, not the host. - Model List: leave empty unless you want to restrict which models appear in the dropdown.
- Use Client-Side Fetching: leave off. This option sends requests from your browser instead of the LobeChat server. It sounds useful for local networks, but it triggers CORS errors unless your inference server is configured to allow browser-origin requests.
Click "Check" to test the connection. If you see a green checkmark, you are one of the lucky ones. If you see "Connection failed," read the next section.
CORS Configuration: The Part That Breaks Everything
Here is what happened to me. I filled in the API URL correctly. I clicked "Check." LobeChat said "Connection failed." I checked the vLLM logs — nothing. No request arrived. I checked the browser console and saw this:
Access to fetch at 'http://192.168.1.50:8000/v1/models'
from origin 'http://192.168.1.50:3210' has been blocked by CORS policy:
No 'Access-Control-Allow-Origin' header is present on the requested resource.
CORS. Cross-Origin Resource Sharing. The browser security mechanism that prevents a website on one domain from making requests to another domain unless the second domain explicitly allows it. LobeChat's "Check" button makes a fetch request from the browser to your API. The API says "I don't know you" and the browser blocks it.
vLLM does not enable CORS by default. Neither does GPUStack. You have to add it.
For vLLM:
python -m vllm.entrypoints.openai.api_server \
--model Qwen/Qwen3-7B-Instruct \
--port 8000 \
--host 0.0.0.0 \
--max-model-len 8192 \
--gpu-memory-utilization 0.85 \
--allowed-origins "*"
The --allowed-origins "*" flag tells vLLM to accept requests from any origin. For a home network behind a router, this is fine. If your vLLM instance is exposed to the internet, restrict this to your actual domain.
For GPUStack:
Edit /etc/gpustack/config.yaml and add:
cors:
enabled: true
origins:
- "*"
Then restart GPUStack: sudo systemctl restart gpustack.
After enabling CORS, hard-refresh the LobeChat page (Ctrl+Shift+R or Cmd+Shift+R) and click "Check" again. The green checkmark should appear.
One more gotcha: some browser extensions block CORS preflight requests. If you have privacy extensions like uBlock Origin or Privacy Badger, try the connection check in an incognito window with extensions disabled. I lost 15 minutes to an extension I forgot I had installed.
Model Name Mapping: Why "gpt-3.5-turbo" Works
After the connection check passed, I selected my model from the dropdown and sent my first message. LobeChat returned an error: "Model not found."
I checked the vLLM logs. The request arrived. The model name in the request body was gpt-3.5-turbo. My vLLM instance was serving Qwen/Qwen3-7B-Instruct. Those names do not match, so vLLM returned a 404.
LobeChat defaults to OpenAI model names in the dropdown. When you select "gpt-3.5-turbo," it sends that exact string as the model parameter. Your local API has no idea what "gpt-3.5-turbo" means. You have to map it.
In the OpenAI provider settings, look for the "Custom Model Name" or "Model ID Override" field. The exact label changes between LobeChat versions, but it is usually near the API URL field. Enter your actual model name there:
Qwen/Qwen3-7B-Instruct
Alternatively, you can add a custom provider instead of using the OpenAI slot. Go to Settings > Language Model > "Add Custom Provider." Set the provider name to something like "Local vLLM," paste the same API URL, and set the model name explicitly. This keeps your local models separate from any real OpenAI configuration you might have.
If you run multiple local models — say, Qwen3-7B and Llama-3.1-8B — you need to add each one as a separate provider entry or use the model list field to specify multiple IDs separated by commas:
Qwen/Qwen3-7B-Instruct,meta-llama/Llama-3.1-8B-Instruct
LobeChat will query /v1/models to populate the dropdown. If your API supports the models endpoint, this works automatically. vLLM does. GPUStack does. Ollama does not, so you have to enter the names manually.
Testing the Connection
Once the settings are saved, start a new conversation. In the model selector at the top of the chat window, choose your local model. Type a simple test message: "What is 17 times 23?"
If everything is configured correctly, you should see a response within a few seconds. The first request is always slower because vLLM warms up CUDA kernels. Subsequent requests will be faster.
Check these indicators:
- The message bubble appears with a thinking indicator, then fills with text.
- The token count shows in the bottom-right corner of the message.
- The model name in the top selector matches what you configured.
- No red error banners appear at the top of the screen.
If you see "Failed to fetch" or a timeout, check the browser console for the exact error. Common causes: wrong IP address, CORS still disabled, firewall blocking port 8000, or vLLM crashed.
Customizing the UI
LobeChat's default theme is clean but generic. I made three changes that improved my daily workflow.
1. Set a custom provider icon and name. In the provider settings, you can upload an icon and rename "OpenAI" to "Local Server." This prevents the cognitive dissonance of seeing the OpenAI logo next to your homemade Qwen deployment.
2. Enable the system prompt editor. Go to Settings > General > "Enable System Prompt." This adds a text area above the chat where you can set the system message per conversation. It is essential for testing how different system prompts affect model behavior.
3. Configure the default model. In Settings > General, set your local model as the default. Otherwise, every new conversation starts with gpt-3.5-turbo selected, and your first message fails until you remember to switch.
4. Set up conversation archiving. Without a database backend, LobeChat stores conversations in the browser's localStorage. If you clear cookies or switch devices, your history disappears. For a persistent setup, configure the PostgreSQL environment variables in your Docker Compose file:
environment:
- DATABASE_URL=postgresql://user:pass@postgres:5432/lobechat
- NEXT_AUTH_SECRET=your-random-secret
This requires running a PostgreSQL container alongside LobeChat. It is worth the effort if you use LobeChat daily.
Troubleshooting: Three Common Failures
These are the failures I see most often in forums and Discord, including the ones that cost me the most time.
Failure 1: "Connection failed" despite correct URL
Symptom: The connection check in LobeChat fails immediately. Curl from the host machine works fine.
Cause: You used localhost or 127.0.0.1 in the API URL. Inside the LobeChat Docker container, localhost refers to the container itself, not the Docker host.
Fix: Use the host machine's actual IP address on your local network. On Linux, ip addr show will list it. It usually looks like 192.168.1.xxx. If you are running LobeChat and vLLM on the same machine, you can also use the Docker host gateway address: http://host.docker.internal:8000/v1 on Docker Desktop, or create a custom Docker network and use container names.
Failure 2: CORS errors in the browser console
Symptom: The connection check fails with CORS-related errors in the browser's developer console. The network tab shows the request as "blocked."
Cause: Your inference server does not send Access-Control-Allow-Origin headers. This is the default for vLLM and GPUStack.
Fix: Add the CORS flag to your inference server startup. For vLLM, use --allowed-origins "*". For GPUStack, enable CORS in config.yaml. Hard-refresh LobeChat after making the change. Test in an incognito window to rule out extension interference.
Failure 3: "Model not found" or 404 on every message
Symptom: The connection check passes. Starting a conversation returns an error saying the model does not exist.
Cause: LobeChat is sending an OpenAI model name like gpt-3.5-turbo or gpt-4, but your local API serves models with Hugging Face IDs like Qwen/Qwen3-7B-Instruct.
Fix: Override the model name in the provider settings. Either set a custom model ID in the OpenAI provider slot, or add a separate custom provider with the correct name. If you use the model list field, make sure the names exactly match what your API returns from /v1/models. Capitalization matters. The slash in Qwen/Qwen3-7B-Instruct matters.
One edge case: if you run vLLM with --served-model-name, you can make it respond to gpt-3.5-turbo as an alias:
python -m vllm.entrypoints.openai.api_server \
--model Qwen/Qwen3-7B-Instruct \
--served-model-name gpt-3.5-turbo
This is useful if you have tools hardcoded to use OpenAI model names. LobeChat will then accept gpt-3.5-turbo and vLLM will map it to your actual model.
Parting Notes
LobeChat is the best open-source chat interface I have found for local LLMs. It is polished, actively maintained, and supports enough features that I no longer miss the commercial alternatives. But the connection setup is under-documented in exactly the places where users get stuck.
The three rules that would have saved me two hours: use the real IP address, not localhost; enable CORS on your inference server; and override the model name so it matches what your API serves. Every other problem I hit was a consequence of one of those three.
My current setup runs LobeChat in Docker on port 3210, vLLM on port 8000 with CORS enabled, and GPUStack on port 80 as a backup. I can switch between Qwen3-7B, Llama-3.1-8B, and GPT-4o without leaving the same tab. The local models cost me nothing per token. The UI costs me nothing at all. The only price was the Sunday afternoon I spent learning how the pieces fit together.
Start with the curl test. Get that working first. Then add LobeChat, one setting at a time. When something breaks, check the browser console before you check the server logs — half the errors are CORS, and the console tells you immediately.