TCP Handshake - Deep Dive

Every time your service talks to a database, an API, or another microservice, three packets go back and forth before any real data moves. Your code never sees this. The OS handles it entirely.

This is the TCP three-way handshake. TCP was built for unreliable networks where packets get dropped, arrive out of order, or don't arrive at all. Before any data moves, both machines need to agree they can reach each other and agree on sequence numbers to track every byte of the conversation. The handshake solves both.

Without it, you have no way to detect a dropped packet, reassemble out-of-order data, or know if the other side is even listening. Every reliability guarantee TCP provides starts here.

1. The Handshake
- Why TCP needs this at all
- SYN, SYN-ACK, ACK, what's inside each packet
- Sequence numbers and why they exist
- TCP states from LISTEN to ESTABLISHED

2. What Surrounds It
- How ports tell the OS which process gets the connection
- What the kernel does that your code never sees
- The two queues that fill up under load
- TCP options both sides agree on before data moves
- Where TLS fits in the sequence


3. How It Breaks
- SYN flood and why it works
- Connection refused vs timed out
- Half-open connections
- TIME_WAIT and when it becomes a problem


4. Hands-On Trial
- tcpdump: reading the three packets live
- ss: reading connection states in production
- curl timing: isolating where the slowness actually is
- The two kernel params to check first


5. Real-World Scenarios
- What Nginx actually protects your app from
- Why connection pooling exists

What the Handshake Is

The client sends a SYN packet. The server replies with SYN-ACK. The client sends ACK. Connection established. That's the summary every tutorial gives you. Here's what's actually inside those packets.

SYN

SYN stands for synchronize. When your client sends a SYN, it picks a random starting sequence number, say 1000, and sends it to the server. This number is not arbitrary. Every byte your client sends after this will be numbered relative to it. If the client sends 500 bytes, the server knows to expect bytes 1001 through 1500. If byte 1300 never arrives, the server knows exactly which one to ask for again. That's how TCP tracks data reliably across an unreliable network.

SYN-ACK

The server receives the SYN and responds with two things in one packet. First, it acknowledges the client's sequence number by sending ACK 1001, meaning "I got your 1000, send me 1001 next." Second, it picks its own random sequence number, say 5000, and sends that as its SYN. Now the client knows what to expect from the server's side. Both directions of the connection are being set up simultaneously in this one packet.

ACK

The client acknowledges the server's sequence number by sending ACK 5001. No data yet. Just confirmation. At this point both sides have exchanged sequence numbers, both sides have confirmed receipt, and the connection is in ESTABLISHED state on both ends.

TCP States

Your OS tracks every connection through a state machine. Knowing these states matters when you're debugging. The server starts in LISTEN, waiting for incoming connections on a port. When the client sends SYN, the client moves to SYN_SENT. When the server receives it and sends SYN-ACK, the server moves to SYN_RECEIVED. When the final ACK lands, both sides move to ESTABLISHED. This is when your application code finally gets involved. Everything before ESTABLISHED happened entirely in the kernel.

Sequence Numbers

One thing worth being clear on: these starting sequence numbers are random on purpose. If they were predictable, an attacker could inject packets into your TCP stream by guessing what sequence number comes next. Randomizing them makes that attack impractical. This is called ISN, Initial Sequence Number, and the randomization is handled by the OS, not your application.

Around the Handshake

Ports and Processes

When a SYN packet arrives, the OS needs to know which process should handle it. That's what ports do. The server binds to a well-known port, 443 for HTTPS, 5432 for Postgres, 6379 for Redis. The client picks an ephemeral port, a temporary high-numbered port usually in the range 32768–60999 on Linux. The four-tuple of source IP, source port, destination IP, destination port uniquely identifies every connection on the machine. Two connections to the same server on the same port are different connections because the client's ephemeral port differs.

The Kernel's Role

Your application never runs the handshake. You call accept() in your code and get back an already-established connection. The kernel's TCP stack handled the SYN, sent the SYN-ACK, waited for the ACK, and only then handed the connection to your process. This matters because the handshake can fail, queue up, or get dropped before your application sees anything. When connections are slow or failing, the problem is often not in your code.

The Backlog Queues

The kernel maintains two queues per listening socket. The SYN queue holds half-open connections, SYN received, SYN-ACK sent, waiting for the final ACK. The accept queue holds fully established connections waiting for your application to call accept(). Both queues have size limits. If your application is slow to call accept() and the accept queue fills up, new connections get dropped even though the network is fine. Two kernel parameters control this: net.core.somaxconn caps the accept queue, net.ipv4.tcp_max_syn_backlog caps the SYN queue. Under traffic spikes these are the first things to check.

TCP Options

During the handshake, both sides negotiate capabilities. MSS, Maximum Segment Size, tells each side the largest chunk of data it can receive in one packet, avoiding fragmentation. Window scaling allows buffer sizes larger than 65KB, which matters on high-latency high-bandwidth links. SACK, Selective Acknowledgment, lets the receiver tell the sender exactly which packets arrived, so only the missing ones get retransmitted instead of everything after the gap. These are agreed in the SYN and SYN-ACK packets. By the time the connection is ESTABLISHED, both sides know exactly how they'll talk.

Where TLS Fits

TLS does not replace the TCP handshake. It happens after it. First the three-way TCP handshake completes. Then the TLS handshake starts, certificate exchange, cipher negotiation, key agreement. Only after both handshakes complete does your application data move. This is why TLS adds latency. On a high-latency connection you're paying for multiple round trips before a single byte of your payload moves. TLS 1.3 reduced this by cutting the TLS handshake from two round trips to one, but the TCP handshake still comes first, always.

How It Breaks

SYN Flood

A SYN flood is the simplest DoS attack against TCP. The attacker sends thousands of SYN packets with spoofed source IPs. Your server sends SYN-ACK to each one and moves those connections into the SYN queue, waiting for an ACK that never comes. The SYN queue fills up. Legitimate connections start getting dropped. Your application sees nothing wrong, the kernel is the one drowning.

The fix is SYN cookies. When enabled, the server doesn't allocate any state for the half-open connection. Instead it encodes the connection information into the sequence number it sends back. If a legitimate ACK arrives, the server reconstructs the state from that number. If no ACK arrives, nothing was wasted. Enable it with net.ipv4.tcp_syncookies=1. On most Linux distributions it's on by default.

Connection Refused vs Timed Out

These two errors tell you completely different things and engineers mix them up constantly.

Connection refused means the TCP RST packet came back immediately. The host is reachable, the packet got there, but nothing is listening on that port. The process is down, or it's bound to the wrong port, or a firewall sent the RST on its behalf.

Connection timed out means no response came at all. The SYN left your machine and disappeared. Either the host is unreachable, a firewall is silently dropping packets, or a network route is broken. The key word is silently. A firewall that sends RST gives you refused. A firewall that drops without responding gives you timeout. Timeout is always harder to debug because you're waiting for the OS to give up, which by default can take over a minute.

Half-Open Connections

A half-open connection is when one side thinks the connection is established and the other doesn't. This happens when one side crashes and restarts without a clean shutdown. The surviving side still has the connection in ESTABLISHED state. When it tries to send data, the restarted side has no record of this connection and sends RST. TCP keepalive exists to detect this, the OS sends small probe packets on idle connections and tears down the connection if no response comes. Without keepalive, half-open connections can sit for hours consuming resources.

TIME_WAIT

After a connection closes, the side that initiated the close enters TIME_WAIT state for 2 * MSL, Maximum Segment Lifetime, typically 60 seconds on Linux, so TIME_WAIT lasts up to 2 minutes. This exists so any delayed packets from the old connection can expire before a new connection uses the same four-tuple. The problem appears in high-throughput services that open and close many short-lived connections. You can exhaust your ephemeral port range because ports in TIME_WAIT can't be reused. The fix is connection pooling, keep connections alive and reuse them instead of opening new ones for every request.

Observing It Live

tcpdump

This is the closest you'll get to watching the handshake happen in real time. Run this on your server:

tcpdump -i eth0 'tcp[tcpflags] & (tcp-syn|tcp-ack) != 0'

Every line you see is a packet. The flags tell you where in the handshake you are. S means SYN. S. means SYN-ACK. The dot alone means ACK. A complete handshake looks like three lines, S, then S., then a dot, appearing in under a millisecond on a local network. If you see S repeated from the same source with no S. response, your SYN queue or accept queue is full. If you see S with no reply at all, something upstream is dropping packets.

ss and netstat

ss is the modern replacement for netstat. To see connection states:

ss -tan

The state column tells you everything. LISTEN means the socket is waiting. SYN-RECV means handshakes in progress, if this number is unusually high, you're under a SYN flood or your application is too slow accepting connections. ESTABLISHED means live connections. TIME-WAIT means recently closed connections still expiring. A healthy server under normal load should have very few SYN-RECV entries. If you see hundreds, something is wrong before your application even runs.

To see queue sizes on your listening sockets:

ss -tlnp

The Recv-Q column on a LISTEN socket shows how many connections are sitting in the accept queue waiting for your application to call accept(). If this is consistently non-zero, your application is the bottleneck, not the network.

Kernel Parameters

When you suspect queue limits are the problem:

sysctl net.core.somaxconn
sysctl net.ipv4.tcp_max_syn_backlog

Compare these against the Recv-Q and SYN-RECV counts you saw in ss. If connections are hitting the queue limit, increase these values and watch whether the problem clears. This is not a permanent fix, it buys you time while you find why your application is slow to accept connections.

What Slow Connections Actually Look Like

If connections are completing but slowly, the handshake itself is rarely the problem. The handshake takes one round trip. If your round trip time to the server is 1ms, the handshake takes 1ms. If connections feel slow, look at what happens after ESTABLISHED, DNS resolution before the connect, TLS handshake time, or time-to-first-byte from your application. Use curl with timing to break it down:

curl -o /dev/null -s -w "dns:%{time_namelookup} connect:%{time_connect} tls:%{time_appconnect} total:%{time_total}\n" https://yourservice.com

The connect field is your TCP handshake time. Everything else is something else's problem.

Real-world Scenarios

Nginx and the Handshake

When you put Nginx in front of your application, the TCP handshake stops being your application's problem, and that's exactly the point.

A client connects to Nginx. Nginx handles the TCP handshake, the TLS handshake, and buffers the request. Only after all of that does Nginx open a connection to your upstream application. Your application sees a clean, already-established connection from Nginx. It never deals with slow clients, partial handshakes, or TLS overhead.

This matters more than most engineers realize. Imagine a mobile client on a slow 4G connection sending an HTTP request. It might take 800ms just to finish the TLS handshake. If your application handled this directly, that thread or process would be sitting idle for 800ms waiting for the client to finish connecting before sending a single byte. Multiply that by a few hundred slow clients and your application is stuck waiting on network I/O instead of doing real work.

Nginx solves this with a model built around non-blocking I/O and an event loop. It can hold thousands of slow half-connected clients open simultaneously without spawning a thread per connection. It finishes their handshakes in the background, buffers their requests, and only forwards complete requests upstream. Your application only ever sees fast, local connections from Nginx, typically on the same machine or same VPC, sub-millisecond round trips, no TLS.

This is called request buffering and it's one of the core reasons Nginx exists as a reverse proxy. Without it, a slow client directly hitting your app server is a slow client tying up your app server. With Nginx in front, a slow client is Nginx's problem.

Connection Pooling

The handshake is not free. Every new TCP connection costs you one round trip before any data moves. On a local network that might be 0.5ms. Against an external database over a real network, it could be 5–20ms. If your application opens a new connection to Postgres for every query and closes it after, you're paying that cost on every single request.

Connection pooling keeps a set of already-established connections open and reuses them. Your application asks the pool for a connection, uses it, returns it. No handshake. No TIME_WAIT. No ephemeral port consumed. The connection was already in ESTABLISHED state sitting idle in the pool.

PgBouncer does this for Postgres. It sits between your application and the database, maintains a small pool of real database connections, and multiplexes many application requests across them. A service handling 500 requests per second might need only 20–30 actual database connections if queries are fast. Without pooling, that same service would be opening and closing hundreds of connections per second, each one paying the handshake cost, each one leaving a TIME_WAIT entry behind.

The rule is simple. Any service that talks to a database or another internal service repeatedly should never open a new connection per request. Use a connection pool. The handshake cost is small per connection but it compounds fast under load, and TIME_WAIT exhaustion at scale is a real production incident, not a theoretical one.

Thanks for supporting this newsletter. Y’all are the best!
Until next time!

Join 1,000+ engineers learning DevOps the hard way

Every week, I share:

How I'd approach problems differently (real projects, real mistakes)
Career moves that actually work (not LinkedIn motivational posts)
Technical deep-dives that change how you think about infrastructure

No fluff. No roadmaps. Just what works when you're building real systems.

👉 Subscribe for free to get it delivered every week

👋 Find me on Twitter | Linkedin | Connect 1:1

Thank you for supporting this newsletter.

Y’all are the best.