How to Expose Your Local LLM to the Internet Without Getting Pwned

I wanted to use my local model from my phone. I did NOT want to wake up to a $5000 cloud bill because someone found my open port.

This is the guide I wish existed when I started. I will show you the dangerous way first — because understanding why it is dangerous is what makes the safe way stick. Then I will walk through the exact setup I use now: Cloudflare Tunnel, basic auth, rate limiting, and a bedtime checklist that lets me sleep through the night.

My setup: a home server running vLLM on Ubuntu 22.04, behind a standard residential router. The model I expose is Qwen3-7B-Instruct, running at about 45 tokens per second. I access it from my phone, my laptop at coffee shops, and occasionally from a tablet when I am traveling. The server has been online for four months. Nobody unauthorized has gotten in.

Why You Might Want to Do This

Local LLMs are great until you leave your house. Then you are stuck choosing between:

Paying for cloud API tokens every time you want to ask a question
Running everything through a VPN back to your home network, which drains your phone battery and adds 200ms of latency
Not having access to your model at all

I tried the VPN route first. WireGuard is solid, but on iOS it kept disconnecting when the phone slept. Reconnecting took 3-5 seconds. I wanted something that just worked, like ChatGPT, but pointed at my own hardware.

The other reason is cost. I already own the GPU. Adding a cloud API subscription on top feels like paying rent on a house I own. The marginal cost of exposing my local model is basically zero. The marginal cost of a mistake could be hundreds of dollars in electricity or, worse, a compromised home network.

That tradeoff is what this article is about.

The Naive Approach: Port Forwarding

Here is what I almost did. It is what most tutorials suggest, and it is terrifying.

I logged into my router admin panel. I found the port forwarding section. I created a rule: external port 8000, internal port 8000, target IP 192.168.1.42 (my server). I checked my public IP on whatismyipaddress.com. I typed http://<my-public-ip>:8000/v1/chat/completions into my phone browser. It worked.

I felt clever for about 90 seconds. Then I started thinking about what I had actually done.

My home IP is scanned constantly. Shodan, Censys, and botnets crawl the IPv4 space 24/7 looking for open ports. Port 8000 is a common target because it is the default for development servers and — yes — vLLM's API server. The moment I opened that port, I put a neon sign on my network that says "LLM inference server here, no authentication required."

Here is what could have happened:

A bot finds my open port within hours
It starts hammering /v1/chat/completions with long prompts and high max_tokens
My GPU runs at 100% load 24/7, generating nonsense for a stranger
My electricity bill spikes. My GPU fans scream
Worse: if my vLLM version has a known vulnerability, the attacker gets shell access

I did not want to find out if vLLM 0.11.2 has an RCE vulnerability. Port forwarding was off the table.

What I worried about: botnets, zero-days, crypto miners using my GPU, someone pivoting from my server to my NAS. What actually happened: nothing. Four months of silence, which is exactly what you want from a security setup.

The Right Way: Cloudflare Tunnel

Cloudflare Tunnel creates an outbound connection from your server to Cloudflare's edge network. No open ports. No public IP exposure. No router configuration. Your server initiates a secure tunnel to Cloudflare, and Cloudflare serves your application through a public hostname with HTTPS and DDoS protection.

The mental model is inverted: instead of the internet reaching into your network, your network reaches out to the internet. If the tunnel drops, your server is invisible.

Step 1: Install cloudflared

On your server, run the official install script:

curl -L --output cloudflared.deb https://github.com/cloudflare/cloudflared/releases/latest/download/cloudflared-linux-amd64.deb
sudo dpkg -i cloudflared.deb

For ARM systems, replace amd64 with arm64 in the URL.

Step 2: Authenticate with Cloudflare

cloudflared tunnel login

This opens a browser window. Log into your Cloudflare account and select the domain you want to use. I use a subdomain like llm.mydomain.com. The command downloads a certificate to ~/.cloudflared/cert.pem. Keep this file safe.

Step 3: Create and configure the tunnel

cloudflared tunnel create local-llm

This outputs a tunnel UUID. Copy it. Create ~/.cloudflared/config.yml:

tunnel: <YOUR-TUNNEL-UUID>
credentials-file: /home/<your-user>/.cloudflared/<YOUR-TUNNEL-UUID>.json

ingress:
  - hostname: llm.yourdomain.com
    service: http://localhost:8000
  - service: http_status:404

The last line is a catch-all that returns 404 for any unmatched hostname. This prevents random scans from probing your tunnel.

Step 4: Route DNS

cloudflared tunnel route dns local-llm llm.yourdomain.com

This creates a CNAME record in your Cloudflare DNS automatically.

Step 5: Run the tunnel

cloudflared tunnel run local-llm

If everything worked, you can now visit https://llm.yourdomain.com and see your vLLM server responding. The connection is HTTPS by default. Your router still has zero open ports. Your public IP is never exposed.

Step 6: Systemd service (so it survives reboots)

Running the tunnel manually is fine for testing. For production, make it a service:

sudo cloudflared service install
sudo systemctl enable --now cloudflared

Check status with sudo systemctl status cloudflared. If it shows "active (running)," you are good.

Cloudflare Tunnel is free for personal use. The free plan includes unlimited tunnels, DDoS protection, and HTTPS. I have never paid Cloudflare a cent for this setup.

Adding Basic Authentication

Cloudflare Tunnel gives you a public URL with TLS. It does NOT give you authentication. Anyone who guesses your subdomain can still hit your API. You need to add a gate.

Cloudflare Access is the cleanest solution. It sits in front of your tunnel and enforces identity checks before traffic reaches your server. Here is the setup:

In the Cloudflare dashboard, go to Access > Applications
Click Add an application, choose Self-hosted
Application name: "Local LLM"
Session duration: 24 hours
Assign a domain: llm.yourdomain.com
Under Policies, create a policy named "Allow Me"
Action: Allow
Include rule: Emails — add your email address
Save and deploy

Now when you visit https://llm.yourdomain.com, Cloudflare shows a login page before forwarding any traffic to your server. If someone finds your URL, they hit a wall. They never touch your vLLM instance.

Alternative for non-Cloudflare users: put an nginx reverse proxy in front of vLLM and use HTTP basic auth:

server {
    listen 8080;
    location / {
        auth_basic "LLM API";
        auth_basic_user_file /etc/nginx/.htpasswd;
        proxy_pass http://localhost:8000;
    }
}

Create the password file with htpasswd -c /etc/nginx/.htpasswd yourusername, then point Cloudflare Tunnel at http://localhost:8080. This gives you two layers: Cloudflare's edge security and nginx-level basic auth. I ran this for two weeks before switching to Cloudflare Access and found it perfectly adequate.

Rate Limiting: Why You Need It

Authentication stops casual attackers. It does not stop someone who steals your credentials, or a bug in your client that goes into a request loop, or your own script that forgets to sleep between API calls.

I learned this the hard way. I wrote a Python script to batch-process notes and forgot to add a time.sleep(). It sent 400 requests in 90 seconds. My GPU thermal-throttled. It took me 20 minutes to realize I was DDoSing myself.

Rate limiting prevents this. In the Cloudflare dashboard, go to Security > WAF > Rate limiting rules and create a rule for your hostname. I configured:

Maximum 10 requests per minute from a single IP
Maximum 100 requests per hour from a single IP
Burst allowance: 3 requests in 10 seconds
Action: block for 10 minutes, then auto-release

These thresholds are conservative for personal use. I have never hit them during normal phone usage, but they caught my runaway script within 30 seconds. I also added a custom response body for blocked requests: "Rate limit exceeded. If this is your server, check your client code."

Monitoring and Alerting Setup

Security is not a one-time configuration. It is a process of noticing when things change. I set up three monitoring layers.

Layer 1: Cloudflare Analytics

The Cloudflare dashboard shows request volume, error rates, and top countries of origin. I check it once a week for traffic from countries I have never visited, sudden spikes in 4xx errors, or request volume above my normal pattern. In four months, I have seen exactly one anomaly: a burst of 12 requests from Germany. It was me, testing from a VPN node I forgot I had connected.

Layer 2: Server Resource Alerts

I installed netdata for lightweight monitoring:

wget -O /tmp/netdata-kickstart.sh https://my-netdata.io/kickstart.sh
sh /tmp/netdata-kickstart.sh --stable-channel

I configured alerts for GPU temperature above 82C, GPU utilization above 95% for more than 5 minutes, and sustained inbound traffic above 10 MB/s. Netdata sends these to a Discord webhook. I have received exactly two alerts — both were my own batch jobs running hot. Better a false positive than a missed breach.

Layer 3: Log Review

Once a week, I run this command:

sudo grep -i "error\|unauthorized\|rate limit" /var/log/cloudflared.log | tail -20

It takes 10 seconds. In four months, the only errors have been my own mistyped API keys. The log file is small because there is nothing to log. That is the goal.

Testing From Outside Your Network

Before you rely on this setup, test it properly. Here is my checklist:

Disconnect from WiFi. Use your phone's cellular connection. Visit https://llm.yourdomain.com. You should see the Cloudflare Access login page, not your vLLM server directly.
Test the API endpoint. From your phone, run:
```
curl -X POST https://llm.yourdomain.com/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer your-api-key" \
  -d '{"model": "Qwen3-7B-Instruct", "messages": [{"role": "user", "content": "Hello"}]}'
```
You should get a JSON response. A 401 means your auth works but the API key is wrong. A 302 redirect means Cloudflare Access is intercepting API requests — configure a service auth bypass for programmatic access.
Verify no direct access. Try visiting http://your-public-ip:8000 from your phone. It should time out. If it responds, your port forwarding rule is still active. Go delete it.
Test rate limiting. Run a loop of 20 requests as fast as possible. After the burst threshold, you should get 429 responses. Wait 10 minutes and confirm normal access resumes.
Check from a different country. Use a VPN or ask a friend abroad to visit your URL. They should hit the auth wall just like you do.

I did all of these tests on day one. The port forwarding test failed at first — I had forgotten my ISP's modem also had a forwarding rule. Finding and removing it is why my public IP scan now returns nothing.

What I Check Before Going to Bed

Paranoia sounds exhausting, but it becomes routine. My nightly checklist takes 45 seconds:

Glance at Netdata. Is GPU utilization at 0%? Good. If it is above 10%, something is running that I forgot about.
Check Cloudflare Analytics for the last 4 hours. Any traffic while I was asleep? There should not be. I am the only user.
Verify the tunnel is up. sudo systemctl is-active cloudflared should print "active."
Confirm no new login emails. Cloudflare Access sends me an email for every successful authentication. If I see one I do not recognize, I know something is wrong.

In four months, I have never found a problem during this check. The value is not in catching breaches. The value is in knowing, with confidence, that there is nothing to catch.

What I worried about: forgetting to check one night and missing an intrusion. What actually happened: the checks became automatic. It is like locking your front door. You do not debate it every night. You just do it.

The Bottom Line

Exposing a local LLM to the internet is a risk calculation. Port forwarding is fast, free, and dangerous. Cloudflare Tunnel is slightly more work, still free, and dramatically safer. The gap in security is not incremental — it is categorical. An open port is discoverable. A tunnel is not.

Add authentication so guessing your URL is not enough. Add rate limiting so your own mistakes do not hurt you. Add monitoring so you know when the world changes. Then sleep soundly.

My total cost: $0 for Cloudflare, $0 for software, about 90 minutes of initial configuration, and 45 seconds per night of maintenance. My benefit: unlimited access to my own model from anywhere, with no API bills and no 3 AM anxiety about open ports.

If you are currently port forwarding, close it today. Install Cloudflare Tunnel this weekend. The time you spend now could save you weeks of recovery later.