I wanted to use my local model from my phone. I did NOT want to wake up to a $5000 cloud bill because someone found my open port.
This is the guide I wish existed when I started. I will show you the dangerous way first — because understanding why it is dangerous is what makes the safe way stick. Then I will walk through the exact setup I use now: Cloudflare Tunnel, basic auth, rate limiting, and a bedtime checklist that lets me sleep through the night.
My setup: a home server running vLLM on Ubuntu 22.04, behind a standard residential router. The model I expose is Qwen3-7B-Instruct, running at about 45 tokens per second. I access it from my phone, my laptop at coffee shops, and occasionally from a tablet when I am traveling. The server has been online for four months. Nobody unauthorized has gotten in.
Why You Might Want to Do This
Local LLMs are great until you leave your house. Then you are stuck choosing between:
- Paying for cloud API tokens every time you want to ask a question
- Running everything through a VPN back to your home network, which drains your phone battery and adds 200ms of latency
- Not having access to your model at all
I tried the VPN route first. WireGuard is solid, but on iOS it kept disconnecting when the phone slept. Reconnecting took 3-5 seconds. I wanted something that just worked, like ChatGPT, but pointed at my own hardware.
The other reason is cost. I already own the GPU. Adding a cloud API subscription on top feels like paying rent on a house I own. The marginal cost of exposing my local model is basically zero. The marginal cost of a mistake could be hundreds of dollars in electricity or, worse, a compromised home network.
That tradeoff is what this article is about.
The Naive Approach: Port Forwarding
Here is what I almost did. It is what most tutorials suggest, and it is terrifying.
I logged into my router admin panel. I found the port forwarding section. I created a rule: external port 8000, internal port 8000, target IP 192.168.1.42 (my server). I checked my public IP on whatismyipaddress.com. I typed http://<my-public-ip>:8000/v1/chat/completions into my phone browser. It worked.
I felt clever for about 90 seconds. Then I started thinking about what I had actually done.
My home IP is scanned constantly. Shodan, Censys, and botnets crawl the IPv4 space 24/7 looking for open ports. Port 8000 is a common target because it is the default for development servers and — yes — vLLM's API server. The moment I opened that port, I put a neon sign on my network that says "LLM inference server here, no authentication required."
Here is what could have happened:
- A bot finds my open port within hours
- It starts hammering
/v1/chat/completionswith long prompts and highmax_tokens - My GPU runs at 100% load 24/7, generating nonsense for a stranger
- My electricity bill spikes. My GPU fans scream
- Worse: if my vLLM version has a known vulnerability, the attacker gets shell access
I did not want to find out if vLLM 0.11.2 has an RCE vulnerability. Port forwarding was off the table.
What I worried about: botnets, zero-days, crypto miners using my GPU, someone pivoting from my server to my NAS. What actually happened: nothing. Four months of silence, which is exactly what you want from a security setup.
The Right Way: Cloudflare Tunnel
Cloudflare Tunnel creates an outbound connection from your server to Cloudflare's edge network. No open ports. No public IP exposure. No router configuration. Your server initiates a secure tunnel to Cloudflare, and Cloudflare serves your application through a public hostname with HTTPS and DDoS protection.
The mental model is inverted: instead of the internet reaching into your network, your network reaches out to the internet. If the tunnel drops, your server is invisible.
Step 1: Install cloudflared
On your server, run the official install script:
curl -L --output cloudflared.deb https://github.com/cloudflare/cloudflared/releases/latest/download/cloudflared-linux-amd64.deb
sudo dpkg -i cloudflared.deb
For ARM systems, replace amd64 with arm64 in the URL.
Step 2: Authenticate with Cloudflare
cloudflared tunnel login
This opens a browser window. Log into your Cloudflare account and select the domain you want to use. I use a subdomain like llm.mydomain.com. The command downloads a certificate to ~/.cloudflared/cert.pem. Keep this file safe.
Step 3: Create and configure the tunnel
cloudflared tunnel create local-llm
This outputs a tunnel UUID. Copy it. Create ~/.cloudflared/config.yml:
tunnel: <YOUR-TUNNEL-UUID>
credentials-file: /home/<your-user>/.cloudflared/<YOUR-TUNNEL-UUID>.json
ingress:
- hostname: llm.yourdomain.com
service: http://localhost:8000
- service: http_status:404
The last line is a catch-all that returns 404 for any unmatched hostname. This prevents random scans from probing your tunnel.
Step 4: Route DNS
cloudflared tunnel route dns local-llm llm.yourdomain.com
This creates a CNAME record in your Cloudflare DNS automatically.
Step 5: Run the tunnel
cloudflared tunnel run local-llm
If everything worked, you can now visit https://llm.yourdomain.com and see your vLLM server responding. The connection is HTTPS by default. Your router still has zero open ports. Your public IP is never exposed.
Step 6: Systemd service (so it survives reboots)
Running the tunnel manually is fine for testing. For production, make it a service:
sudo cloudflared service install
sudo systemctl enable --now cloudflared
Check status with sudo systemctl status cloudflared. If it shows "active (running)," you are good.
Cloudflare Tunnel is free for personal use. The free plan includes unlimited tunnels, DDoS protection, and HTTPS. I have never paid Cloudflare a cent for this setup.
Adding Basic Authentication
Cloudflare Tunnel gives you a public URL with TLS. It does NOT give you authentication. Anyone who guesses your subdomain can still hit your API. You need to add a gate.
Cloudflare Access is the cleanest solution. It sits in front of your tunnel and enforces identity checks before traffic reaches your server. Here is the setup:
- In the Cloudflare dashboard, go to Access > Applications
- Click Add an application, choose Self-hosted
- Application name: "Local LLM"
- Session duration: 24 hours
- Assign a domain:
llm.yourdomain.com - Under Policies, create a policy named "Allow Me"
- Action: Allow
- Include rule: Emails — add your email address
- Save and deploy
Now when you visit https://llm.yourdomain.com, Cloudflare shows a login page before forwarding any traffic to your server. If someone finds your URL, they hit a wall. They never touch your vLLM instance.
Alternative for non-Cloudflare users: put an nginx reverse proxy in front of vLLM and use HTTP basic auth:
server {
listen 8080;
location / {
auth_basic "LLM API";
auth_basic_user_file /etc/nginx/.htpasswd;
proxy_pass http://localhost:8000;
}
}
Create the password file with htpasswd -c /etc/nginx/.htpasswd yourusername, then point Cloudflare Tunnel at http://localhost:8080. This gives you two layers: Cloudflare's edge security and nginx-level basic auth. I ran this for two weeks before switching to Cloudflare Access and found it perfectly adequate.
Rate Limiting: Why You Need It
Authentication stops casual attackers. It does not stop someone who steals your credentials, or a bug in your client that goes into a request loop, or your own script that forgets to sleep between API calls.
I learned this the hard way. I wrote a Python script to batch-process notes and forgot to add a time.sleep(). It sent 400 requests in 90 seconds. My GPU thermal-throttled. It took me 20 minutes to realize I was DDoSing myself.
Rate limiting prevents this. In the Cloudflare dashboard, go to Security > WAF > Rate limiting rules and create a rule for your hostname. I configured:
- Maximum 10 requests per minute from a single IP
- Maximum 100 requests per hour from a single IP
- Burst allowance: 3 requests in 10 seconds
- Action: block for 10 minutes, then auto-release
These thresholds are conservative for personal use. I have never hit them during normal phone usage, but they caught my runaway script within 30 seconds. I also added a custom response body for blocked requests: "Rate limit exceeded. If this is your server, check your client code."
Monitoring and Alerting Setup
Security is not a one-time configuration. It is a process of noticing when things change. I set up three monitoring layers.
Layer 1: Cloudflare Analytics
The Cloudflare dashboard shows request volume, error rates, and top countries of origin. I check it once a week for traffic from countries I have never visited, sudden spikes in 4xx errors, or request volume above my normal pattern. In four months, I have seen exactly one anomaly: a burst of 12 requests from Germany. It was me, testing from a VPN node I forgot I had connected.
Layer 2: Server Resource Alerts
I installed netdata for lightweight monitoring:
wget -O /tmp/netdata-kickstart.sh https://my-netdata.io/kickstart.sh
sh /tmp/netdata-kickstart.sh --stable-channel
I configured alerts for GPU temperature above 82C, GPU utilization above 95% for more than 5 minutes, and sustained inbound traffic above 10 MB/s. Netdata sends these to a Discord webhook. I have received exactly two alerts — both were my own batch jobs running hot. Better a false positive than a missed breach.
Layer 3: Log Review
Once a week, I run this command:
sudo grep -i "error\|unauthorized\|rate limit" /var/log/cloudflared.log | tail -20
It takes 10 seconds. In four months, the only errors have been my own mistyped API keys. The log file is small because there is nothing to log. That is the goal.
Testing From Outside Your Network
Before you rely on this setup, test it properly. Here is my checklist:
- Disconnect from WiFi. Use your phone's cellular connection. Visit
https://llm.yourdomain.com. You should see the Cloudflare Access login page, not your vLLM server directly. - Test the API endpoint. From your phone, run:
You should get a JSON response. A 401 means your auth works but the API key is wrong. A 302 redirect means Cloudflare Access is intercepting API requests — configure a service auth bypass for programmatic access.curl -X POST https://llm.yourdomain.com/v1/chat/completions \ -H "Content-Type: application/json" \ -H "Authorization: Bearer your-api-key" \ -d '{"model": "Qwen3-7B-Instruct", "messages": [{"role": "user", "content": "Hello"}]}' - Verify no direct access. Try visiting
http://your-public-ip:8000from your phone. It should time out. If it responds, your port forwarding rule is still active. Go delete it. - Test rate limiting. Run a loop of 20 requests as fast as possible. After the burst threshold, you should get 429 responses. Wait 10 minutes and confirm normal access resumes.
- Check from a different country. Use a VPN or ask a friend abroad to visit your URL. They should hit the auth wall just like you do.
I did all of these tests on day one. The port forwarding test failed at first — I had forgotten my ISP's modem also had a forwarding rule. Finding and removing it is why my public IP scan now returns nothing.
What I Check Before Going to Bed
Paranoia sounds exhausting, but it becomes routine. My nightly checklist takes 45 seconds:
- Glance at Netdata. Is GPU utilization at 0%? Good. If it is above 10%, something is running that I forgot about.
- Check Cloudflare Analytics for the last 4 hours. Any traffic while I was asleep? There should not be. I am the only user.
- Verify the tunnel is up.
sudo systemctl is-active cloudflaredshould print "active." - Confirm no new login emails. Cloudflare Access sends me an email for every successful authentication. If I see one I do not recognize, I know something is wrong.
In four months, I have never found a problem during this check. The value is not in catching breaches. The value is in knowing, with confidence, that there is nothing to catch.
What I worried about: forgetting to check one night and missing an intrusion. What actually happened: the checks became automatic. It is like locking your front door. You do not debate it every night. You just do it.
The Bottom Line
Exposing a local LLM to the internet is a risk calculation. Port forwarding is fast, free, and dangerous. Cloudflare Tunnel is slightly more work, still free, and dramatically safer. The gap in security is not incremental — it is categorical. An open port is discoverable. A tunnel is not.
Add authentication so guessing your URL is not enough. Add rate limiting so your own mistakes do not hurt you. Add monitoring so you know when the world changes. Then sleep soundly.
My total cost: $0 for Cloudflare, $0 for software, about 90 minutes of initial configuration, and 45 seconds per night of maintenance. My benefit: unlimited access to my own model from anywhere, with no API bills and no 3 AM anxiety about open ports.
If you are currently port forwarding, close it today. Install Cloudflare Tunnel this weekend. The time you spend now could save you weeks of recovery later.