About — AI Deployment Notes

Who I Am

I'm a software engineer working in automation testing and infrastructure operations. Since late 2023, I've been experimenting with local large language models on my own GPU server — initially to save on API costs, but it quickly turned into an obsession.

My current stack centers on the Qwen family of models, using vLLM for inference, GPUStack for model management, and LobeChat for the frontend. In short, I've built my own private ChatGPT setup that I use daily as a coding assistant and document helper.

Why I Started This Blog

When I first got into local deployment, I hit a lot of roadblocks. Tutorials online were either outdated (wouldn't run), too shallow (useless when errors appeared), or AI-generated fluff.

Some of the pitfalls I encountered include:

vLLM failing to start due to a CUDA version mismatch — took two days to track down
Miscalculating VRAM and hitting OOM halfway through model loading, forcing multiple restarts
Cloudflare Tunnel misconfigured, causing persistent 502 errors for external access
Setting the context length too high, making inference unbearably slow

These issues seem trivial in hindsight, but they cost me real time back then. I want to document these experiences to help others walking the same path.

What This Blog Covers

Three main categories:

Tutorials: Step-by-step guides with complete commands and configs that you can follow to get things running
Pitfalls: Real errors I encountered and how I debugged them, with focus on troubleshooting mindset
Comparisons: Side-by-side tool evaluations with clear recommendations

I try to avoid simply reposting official docs or writing generic "intro" articles. Every post should solve a specific problem or give you a clearer picture of which tool to choose.

Tech Stack

Tools and environment I'm currently using:

Hardware: Self-built GPU server (NVIDIA)
Inference: vLLM, GPUStack
Models: Qwen family (Qwen3.6, Qwen3-Coder, etc.)
Frontend: LobeChat
Networking: Cloudflare Tunnel
Containers: Docker

This combination covers most personal AI use cases at a fraction of cloud API cost, with data staying on your own hardware.

Get in Touch

If you run into deployment issues or want to discuss anything, reach out through the contact page. I may not reply instantly, but I do read every message.

If you find the articles helpful, the best support is sharing them with someone who needs them.

About This Blog

Who I Am

Why I Started This Blog

What This Blog Covers

Tech Stack

Get in Touch