The Uptime Engineer
👋 Hi, I am Yoshik Karnawat
Today you’ll learn how on-demand TLS automates certificate management for thousands of domains
Facts
Caddy pioneered on-demand TLS in 2015
On-demand TLS saves SaaS companies tens of thousands of dollars annually
A single on-demand TLS setup can handle millions of domains
Vercel, Netlify, and Fly.io all use on-demand certificate provisioning under the hood
Today, I run all my websites on Caddy.
Not nginx. Not Apache. Not Traefik.
And the reason is that feature I learned building my SaaS in 2019.
If you're building:
A SaaS platform where customers get
customer.yourapp.comA white-label app where clients bring their own domains
Dev environments that spin up
pr-432.staging.example.comfor every pull request
You need certificates for domains you don't know exist yet.
Most teams pre-generate certificates.
Some run cron jobs polling for new domains.
Others build custom automation around Let's Encrypt APIs.
Caddy solves this with on-demand TLS. A feature that provisions certificates in real-time, the moment the first HTTPS request arrives.
No pre-configuration. No DNS polling. No manual certificate management.
Let me show you how it works.
TLS in 30 seconds
Before we go deeper, here's what you need to know about TLS.
TLS = the lock icon in your browser
When you see https://, TLS is running.
It encrypts data between your browser and the server.
Without TLS, traffic is plaintext.
Anyone on the network can read passwords, API keys, session tokens.
How it works:
Browser requests
https://example.comServer sends its TLS certificate (proves identity)
Browser checks if the certificate is signed by a trusted authority
If valid, encryption starts
The certificate problem:
Certificates must be:
Domain-specific → A cert for
app.example.comwon't work fortenant1.example.comIssued by a trusted CA → Self-signed certs trigger browser warnings
Renewed before expiration → Let's Encrypt certs expire every 90 days
Traditional approach: You manually request certificates from a CA, configure your web server, set up renewal automation.
Caddy's approach: It does all of this automatically.
How on-demand TLS flips the model
Traditional flow:
Admin creates domain → Request certificate → Configure server → Deploy On-demand TLS flow:
User visits https://newdomain.example.com → Caddy provisions certificate → Connection completes Total time: 2-5 seconds on first request. Then milliseconds forever.
Here's what happens under the hood.
Step 1: Client starts TLS handshake
Browser sends ClientHello to newdomain.example.com.
Caddy receives the request but has no certificate for this domain yet.
In a normal web server, this would fail.
Caddy does something different.
Step 2: Caddy pauses the connection
Instead of failing, Caddy holds the connection open and triggers certificate provisioning.
The browser waits.
Caddy contacts Let's Encrypt.
Step 3: ACME challenge begins
Let's Encrypt uses the ACME protocol (Automatic Certificate Management Environment) to verify you control the domain.
The challenge works like this:
Let's Encrypt says: "Serve this random token at http://newdomain.example.com/.well-known/acme-challenge/[token]"
Caddy automatically serves the token.
Let's Encrypt fetches it.
If successful, Let's Encrypt issues a signed certificate.
Step 4: Certificate stored and served
Caddy stores the certificate (filesystem, Redis, Consul. Your choice).
The TLS handshake resumes with the newly issued certificate.
Connection established. HTTPS secured.
Step 5: Subsequent requests are instant
The certificate is cached.
Future requests to newdomain.example.com use the stored certificate. No provisioning delay.
First request: 2-5 seconds.
Every request after: milliseconds.
Why this matters
1. Multi-tenant SaaS platforms
You're building a platform where each customer gets customer.yourapp.com.
With traditional TLS:
You need to know every subdomain in advance
You must provision certificates before customers can use their subdomain
Certificate storage grows linearly with customer count
With on-demand TLS:
Customer signs up → Gets subdomain instantly
First HTTPS visit triggers certificate provisioning
Zero manual intervention
Real-world use case: Vercel, Netlify, and Fly.io all use on-demand certificate provisioning for custom domains.
2. Dynamic DNS / Wildcard limitations
Wildcard certificates (*.example.com) don't cover multi-level subdomains.
A wildcard cert for *.example.com covers app.example.com but not tenant.app.example.com.
On-demand TLS provisions exact-match certificates for any subdomain depth automatically.
3. White-label applications
Customers bring their own domains (customdomain.com pointing to your infrastructure).
You can't pre-generate certificates because you don't control DNS.
On-demand TLS provisions certificates as soon as DNS points to your server and the first request arrives.
4. Development and staging environments
Developers spin up ephemeral environments (pr-123.staging.example.com).
Pre-configuring certificates for every PR branch is infeasible.
On-demand TLS handles this automatically, every new branch gets HTTPS on first access.
Security: Why this isn't a free-for-all
Allowing any domain to request a certificate opens attack vectors.
Attack scenario: Certificate exhaustion
An attacker sends requests for millions of random subdomains.
Your server requests certificates until Let's Encrypt rate-limits your account (50 certificates per domain per week).
Caddy's protection mechanisms:
1. Ask endpoint validation
Before provisioning, Caddy calls a custom HTTP endpoint:
GET /check-domain?domain=newdomain.example.com Your backend responds:
200 OK→ Provision certificate4xx/5xx→ Deny request
This lets you enforce:
Domain ownership validation
Customer subscription checks
Rate limiting per tenant
2. Rate limiting
Caddy tracks certificate requests per domain/IP and enforces limits.
3. Allowlist/denylist
Restrict on-demand TLS to specific domain patterns.
Configuration: How to enable it
Basic setup (Caddyfile):
*.example.com {
tls {
on_demand
}
reverse_proxy backend:8080
}This enables on-demand TLS for all subdomains under example.com.
Production-ready (with Ask endpoint):
{
on_demand_tls {
ask https://api.yourapp.com/custom-domain?
interval 5m
}
}
*.example.com {
tls {
on_demand
}
reverse_proxy backend:8080
}Ask endpoint logic (example in Flask):
from flask import Flask, request, jsonify
app = Flask(__name__)
@app.route('/tls/validate', methods=['GET'])
def validate_domain():
domain = request.args.get('domain')
# Check if domain belongs to active customer
if is_valid_customer_domain(domain):
return '', 200 # OK - provision certificate
return '', 403 # Forbidden - deny request
def is_valid_customer_domain(domain):
# Your validation logic here
# Example: Check database for active customer with this domain
# Placeholder logic
valid_domains = ['customer1.example.com', 'customer2.example.com']
return domain in valid_domains
if __name__ == '__main__':
app.run(host='0.0.0.0', port=8080)Storage backend (for multi-server deployments)
On-demand TLS certificates must be shared across servers.
Caddy supports:
Filesystem (default, single-server)
Redis (distributed cache)
Consul (distributed KV store)
S3 (shared storage)
Example with Redis:
{
"storage": {
"module": "redis",
"address": "redis:6379",
"key_prefix": "caddy-tls"
}
}Common mistakes to avoid
Not implementing an Ask endpoint
Attackers exhaust your certificate quota.
Fix: Always validate domain ownership.Forgetting rate limits
Legitimate requests blocked after hitting Let's Encrypt limits.
Fix: Monitor issuance rates and implement tenant-level limits.Using filesystem storage in multi-server setups
Each server provisions its own certificate, wasted API calls.
Fix: Use Redis/Consul/S3 for shared storage.Not handling first-request latency
Users see slow initial page loads.
Fix: Pre-provision certificates for known domains
