I've seen this play out more times than I'd like. Someone sets up Ollama on their VPS or home server. They run ufw status and see their firewall is active. They feel good about it. They move on.
Three weeks later they get an API bill that makes no sense. Or they find out their model endpoint has been running inference for strangers on the internet — burning their hardware cycles, racking up bandwidth, and potentially getting their VPS account flagged for abuse.
The firewall didn't fail. They just didn't know how Docker actually works.
Infrastructure is the boring part until it's the expensive part.
Why UFW isn't protecting you
This is the one that surprises people the most. You set UFW to deny incoming. You check ufw status and port 11434 shows as blocked. You feel safe.
But Docker doesn't route traffic through the INPUT chain where UFW lives. It inserts its own rules into the FORWARD chain — before UFW even gets a look at the packet.
Think of it like this: UFW is a security guard at the front door. Docker built a side entrance that bypasses the lobby entirely.
So when your compose file has:
ports: "11434:11434"
You're binding Ollama to 0.0.0.0. Every interface. Including your public IP. UFW is sitting there blocking the INPUT chain while Docker has already let the traffic through the FORWARD chain.
The fix — one line
# Before — exposed to internet: ports: - "11434:11434" # After — localhost only: ports: - "127.0.0.1:11434:11434"
That's it. Binds to localhost only. External traffic never reaches the container.
Important: don't confuse this with the OLLAMA_HOST environment variable. Even if you set OLLAMA_HOST=127.0.0.1 inside the container, a standard ports: "11434:11434" mapping will still punch a hole through your firewall. The fix must happen in the ports block — not the environment block.
If you actually need Ollama accessible from other machines — don't expose the port directly. Put Nginx or Traefik in front with authentication. That also handles SSL.
Red flag: if you see network_mode: host in a tutorial — run. It completely eliminates Docker's network sandbox and exposes every port directly on the host interface, bypassing every security layer we just discussed.
The CORS wildcard that opens a second door
To get their web UI working, a lot of people set this in their compose file:
environment: - OLLAMA_ORIGINS=*
That wildcard means any website — including a malicious one — can send requests to your Ollama instance directly through your browser when you visit it. Set specific origins instead:
environment: - OLLAMA_ORIGINS=https://your-webui-domain.com
Your compose file might be leaking API keys
If you're connecting Ollama to any external service — OpenRouter, a custom API, anything — check your compose file for this pattern:
environment: - OPENROUTER_KEY=sk-abc123
That key is in plain text. If your repo is public or if you've ever pushed it anywhere, that key has been exposed. Move it to a .env file:
environment:
- OPENROUTER_KEY=${OPENROUTER_KEY}
echo ".env" >> .gitignore
Paste your docker-compose.yml to detect exposed ports, hardcoded secrets, missing healthchecks, and resource limits — with exact copy-paste fixes.
Your cron jobs will collide with model pulls
If you have automated model pulls running on a schedule — and you should — they need to coexist with everything else on that server. Backup jobs. Log rotation. System updates. Three jobs firing at the same minute causes a load spike. Your model pull hangs. No error. No alert. You just come back to a half-pulled model and a confused container.
Visualise your full cron schedule and check for overlaps before you add model management tasks on top of it.
That SSL cert on your Ollama frontend will expire
If you're running Open WebUI or any other Ollama frontend behind Nginx or Traefik — that TLS certificate has an expiry date. It will expire. Usually at an inconvenient time. Set up certificate monitoring across all your domains now. Not when the browser throws a warning.
Set resource limits or Ollama will eat your server
A large model pull, a stuck inference job, or a runaway embedding task will pin your CPU at 100% and take everything else on the host down with it.
deploy:
resources:
limits:
cpus: '2.0'
memory: 4G
Adjust based on your hardware. The point is to set a ceiling so one stuck job can't take the whole machine down.
Set log limits or your disk will disappear
Ollama is surprisingly chatty. Running 24/7, the logs accumulate fast. Left unchecked they'll fill your disk over days or weeks until something breaks in a confusing way.
logging:
driver: json-file
options:
max-size: "10m"
max-file: "3"
And run this occasionally to clean up dangling volumes from crashed containers:
docker system prune
Safe to run — it removes stopped containers and dangling images but won't touch your ~/.ollama directory or Docker volumes. Your 50GB of models are not going anywhere.
The full checklist
# 1. Bind Ollama to localhost only
ports: "127.0.0.1:11434:11434"
# 2. Set specific CORS origins
OLLAMA_ORIGINS=https://your-webui-domain.com
# 3. Move API keys to .env
OPENROUTER_KEY=${OPENROUTER_KEY}
# 4. Set resource limits
deploy.resources.limits: cpus 2.0, memory 4G
# 5. Set log rotation
logging.driver: json-file, max-size 10m, max-file 3
# 6. Check for port exposure
curl http://YOUR_SERVER_IP:11434 (from mobile data)
The Docker Auditor catches all of this automatically. Paste your compose file — exposed ports, hardcoded secrets, missing limits, missing healthchecks — flagged in one pass.
The model layer gets all the attention. The infrastructure layer underneath is where things quietly go wrong. Go run that curl command. Off Wi-Fi. Right now.