The Uptime Engineer

👋 Hi, I am Yoshik Karnawat

Today you’ll learn how on-demand TLS automates certificate management for thousands of domains

Facts

  • Caddy pioneered on-demand TLS in 2015

  • On-demand TLS saves SaaS companies tens of thousands of dollars annually

  • A single on-demand TLS setup can handle millions of domains

  • Vercel, Netlify, and Fly.io all use on-demand certificate provisioning under the hood

Today, I run all my websites on Caddy.

Not nginx. Not Apache. Not Traefik.

And the reason is that feature I learned building my SaaS in 2019.

If you're building:

  • A SaaS platform where customers get customer.yourapp.com

  • A white-label app where clients bring their own domains

  • Dev environments that spin up pr-432.staging.example.com for every pull request

You need certificates for domains you don't know exist yet.

Most teams pre-generate certificates.
Some run cron jobs polling for new domains.
Others build custom automation around Let's Encrypt APIs.

Caddy solves this with on-demand TLS. A feature that provisions certificates in real-time, the moment the first HTTPS request arrives.

No pre-configuration. No DNS polling. No manual certificate management.

Let me show you how it works.

TLS in 30 seconds

Before we go deeper, here's what you need to know about TLS.​

TLS = the lock icon in your browser

When you see https://, TLS is running.​
It encrypts data between your browser and the server.​

Without TLS, traffic is plaintext.​
Anyone on the network can read passwords, API keys, session tokens.​

How it works:

  1. Browser requests https://example.com

  2. Server sends its TLS certificate (proves identity)

  3. Browser checks if the certificate is signed by a trusted authority

  4. If valid, encryption starts

The certificate problem:

Certificates must be:​

  • Domain-specific → A cert for app.example.com won't work for tenant1.example.com

  • Issued by a trusted CA → Self-signed certs trigger browser warnings​

  • Renewed before expiration → Let's Encrypt certs expire every 90 days​

Traditional approach: You manually request certificates from a CA, configure your web server, set up renewal automation.​

Caddy's approach: It does all of this automatically.

How on-demand TLS flips the model

Traditional flow:​

Admin creates domain → Request certificate → Configure server → Deploy 

On-demand TLS flow:​

User visits https://newdomain.example.com → Caddy provisions certificate → Connection completes 

Total time: 2-5 seconds on first request. Then milliseconds forever.

Here's what happens under the hood.​

Step 1: Client starts TLS handshake

Browser sends ClientHello to newdomain.example.com.​

Caddy receives the request but has no certificate for this domain yet.​

In a normal web server, this would fail.

Caddy does something different.​

Step 2: Caddy pauses the connection

Instead of failing, Caddy holds the connection open and triggers certificate provisioning.​

The browser waits.​

Caddy contacts Let's Encrypt.​

Step 3: ACME challenge begins

Let's Encrypt uses the ACME protocol (Automatic Certificate Management Environment) to verify you control the domain.​

The challenge works like this:​

Let's Encrypt says: "Serve this random token at http://newdomain.example.com/.well-known/acme-challenge/[token]"

Caddy automatically serves the token.​

Let's Encrypt fetches it.​

If successful, Let's Encrypt issues a signed certificate.​

Step 4: Certificate stored and served

Caddy stores the certificate (filesystem, Redis, Consul. Your choice).​

The TLS handshake resumes with the newly issued certificate.​

Connection established. HTTPS secured.

Step 5: Subsequent requests are instant

The certificate is cached.​

Future requests to newdomain.example.com use the stored certificate. No provisioning delay.​

First request: 2-5 seconds.
Every request after: milliseconds.

​Why this matters

1. Multi-tenant SaaS platforms

You're building a platform where each customer gets customer.yourapp.com.​

With traditional TLS:​

  • You need to know every subdomain in advance​

  • You must provision certificates before customers can use their subdomain​

  • Certificate storage grows linearly with customer count​

With on-demand TLS:​

  • Customer signs up → Gets subdomain instantly​

  • First HTTPS visit triggers certificate provisioning​

  • Zero manual intervention​

Real-world use case: Vercel, Netlify, and Fly.io all use on-demand certificate provisioning for custom domains.​

2. Dynamic DNS / Wildcard limitations

Wildcard certificates (*.example.com) don't cover multi-level subdomains.​

A wildcard cert for *.example.com covers app.example.com but not tenant.app.example.com.​

On-demand TLS provisions exact-match certificates for any subdomain depth automatically.​

3. White-label applications

Customers bring their own domains (customdomain.com pointing to your infrastructure).​

You can't pre-generate certificates because you don't control DNS.​

On-demand TLS provisions certificates as soon as DNS points to your server and the first request arrives.​

4. Development and staging environments

Developers spin up ephemeral environments (pr-123.staging.example.com).​

Pre-configuring certificates for every PR branch is infeasible.​

On-demand TLS handles this automatically, every new branch gets HTTPS on first access.​

Security: Why this isn't a free-for-all

Allowing any domain to request a certificate opens attack vectors.​

Attack scenario: Certificate exhaustion

An attacker sends requests for millions of random subdomains.​

Your server requests certificates until Let's Encrypt rate-limits your account (50 certificates per domain per week).​

Caddy's protection mechanisms:

1. Ask endpoint validation

Before provisioning, Caddy calls a custom HTTP endpoint:​

GET /check-domain?domain=newdomain.example.com 

Your backend responds:​

  • 200 OK → Provision certificate​

  • 4xx/5xx → Deny request​

This lets you enforce:​

  • Domain ownership validation​

  • Customer subscription checks​

  • Rate limiting per tenant​

2. Rate limiting

Caddy tracks certificate requests per domain/IP and enforces limits.​

3. Allowlist/denylist

Restrict on-demand TLS to specific domain patterns.​

Configuration: How to enable it

Basic setup (Caddyfile):

*.example.com {
    tls {
        on_demand
    }
    reverse_proxy backend:8080
}

This enables on-demand TLS for all subdomains under example.com.​

Production-ready (with Ask endpoint):

{
    on_demand_tls {
        ask https://api.yourapp.com/custom-domain?
        interval 5m
    }
}

*.example.com {
    tls {
        on_demand
    }
    reverse_proxy backend:8080
}

Ask endpoint logic (example in Flask):

from flask import Flask, request, jsonify

app = Flask(__name__)

@app.route('/tls/validate', methods=['GET'])
def validate_domain():
    domain = request.args.get('domain')
    
    # Check if domain belongs to active customer
    if is_valid_customer_domain(domain):
        return '', 200  # OK - provision certificate
    
    return '', 403  # Forbidden - deny request


def is_valid_customer_domain(domain):
    # Your validation logic here
    # Example: Check database for active customer with this domain
    
    # Placeholder logic
    valid_domains = ['customer1.example.com', 'customer2.example.com']
    return domain in valid_domains


if __name__ == '__main__':
    app.run(host='0.0.0.0', port=8080)

Storage backend (for multi-server deployments)

On-demand TLS certificates must be shared across servers.​

Caddy supports:​

  • Filesystem (default, single-server)​

  • Redis (distributed cache)​

  • Consul (distributed KV store)​

  • S3 (shared storage)​

Example with Redis:

{
    "storage": {
        "module": "redis",
        "address": "redis:6379",
        "key_prefix": "caddy-tls"
    }
}

Common mistakes to avoid

  1. Not implementing an Ask endpoint
    Attackers exhaust your certificate quota.​
    Fix: Always validate domain ownership.​

  2. Forgetting rate limits
    Legitimate requests blocked after hitting Let's Encrypt limits.​
    Fix: Monitor issuance rates and implement tenant-level limits.​

  3. Using filesystem storage in multi-server setups
    Each server provisions its own certificate, wasted API calls.​
    Fix: Use Redis/Consul/S3 for shared storage.​

  4. Not handling first-request latency
    Users see slow initial page loads.​
    Fix: Pre-provision certificates for known domains

Keep Reading

No posts found