Transport Layer Protocol: The Big Picture
Life is really simple, but men insist on making it complicated. -Confucius
Mindmap
Introduction
TCP is a foundational communication protocol that enabling reliable data transmission between devices over a network. TCP is designed to ensure that data is delivered accurately and in the correct order, making it crucial for applications where reliability and integrity of data are paramount, such as web browsing, email, and file transfers.
Connection Establishment
TCP utilizes a 3-way handshake to establish a connection between two devices, ensuring both are ready to communicate. This process involves the exchange of three packets: SYN (synchronize), SYN-ACK (synchronize-acknowledge), and ACK (acknowledge). This handshake ensures that both the client and server are synchronized, agree on initial sequence numbers, and are prepared to start data transmission.
TCP Segment
TCP Header
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Source Port | Destination Port | 4 Byte
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Sequence Number | 4 Byte
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Acknowledgement Number | 4 Byte
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Data | |U|A|P|R|S|F| |
| Offset| Reserved |R|C|S|S|Y|I| Window | 4 Byte
| | |G|K|H|T|N|N| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Checksum | Urgent Pointer | 4 Byte
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Options | Padding | 40 Byte
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Data |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
- Port
- Source (ephemeral) : It’s dynamically assigned, making each connection unique and preventing port conflicts.
- Destination (fixed) : Identifies the specific service or application on the receiving host (e.g., HTTP uses port 80).
- Flags
- URG : Signals that certain data should be processed immediately.
- ACK : Confirms receipt of data, ensuring reliable communication.
- PSH : Forces the immediate sending of data without waiting for a buffer to fill.
- RST : Abruptly terminates a connection due to an error or attack.
- SYN : Initiates a connection between two hosts.
- FIN : Gracefully ends a connection.
- Read more : TCP Flag
- Sequence
- To ensure that TCP segments are sent and ordered correctly.
- Acknowledgement
- Confirms the receipt of the previous segments.
- Options
- Maximum Segment Size (MSS) : Defines the largest segment of data that can be transmitted and optimizes network efficiency by reducing the need for fragmentation.
- Window Scale Factor : Expands the window size for data transmission, allowing for higher throughput.
- SACK permitted and SACK : Enhances performance by allowing the receiver to acknowledge out-of-order packets.
- Read more: RFC 6691
- Length
- Specifies how much data is in the segment.
Data
- Contains the actual payload that is being transmitted.
Handshake
3-Way
- Ensures both sender and receiver are ready to exchange data.
- Receiver expects the next sequence number from the source is the same number as ack.
Connection Termination
- The sender (send FIN) who closes the transmission must continue to receive data until the receiver also decides to close the connection (also send FIN).
Flow Control
Flow control in TCP is managed through a mechanism known as the sliding window. This technique allows the sender to transmit multiple packets before needing an acknowledgment for the first one, optimizing the use of network resources while preventing the receiver from becoming overwhelmed by too much data at once. The receiver can adjust the size of the window dynamically, based on its capacity to process incoming data.
Segmentation
- Ensures that large chunks of data don’t overload the network or receiver.
Max Segmentation Size (MSS)
- Defines the largest data chunk that can be sent in one segment.
- Helps optimize transmission efficiency by reducing the need for fragmentation.
Max Transmission Unit (MTU)
- Determines the maximum packet size that can be transmitted in Datalink Layer, usually 1500 byte.
Buffer
- Temporarily stores data before it’s processed.
- Acts as a cushion, preventing data loss during high traffic periods.
Data Transfer
- Data that can be transferred from Application Layer equals MTU - (Size TCP Header + Size IP Layer), which is 1500 bytes - (20 bytes + 20 bytes) = 1460 bytes.
Window
- For example: Sender’s buffer is 100 byte, receiver’s buffer is 70 byte.
- Focus on receiver’s buffer because the flow is controlled by the receiver.
- Packet sent and acknowledged = The data was already sent and received.
- Packet sent and not acknowledged = The data is not received yet.
- Packet that can be sent = The data is ready to sent.
- Packet that can’t be sent = The data that isn’t ready to sent.
Receiver Window (RWND)
- Informs the source how much data the receiver can receive.
Send Window (SWND)
- Determines the amount of unacknowledged data that can be in transit.
Sliding Window
- If we talk about TCP window, we don’t care about packet sent and acknowledged part.
- Only focus on packet that is not received and that is not sent.
- The window size that can be send determined by the receiver window (RWND).
Error Control
To ensure data integrity, TCP implements error control through sequence numbers and acknowledgments. Each segment of data is assigned a unique sequence number, and the receiver sends an acknowledgment (ACK) back to the sender, indicating which data has been successfully received. If a segment is lost or corrupted during transmission, TCP detects this through missing or incorrect acknowledgments and retransmits the affected segment.
Checksum
- Detects errors in transmitted data.
- Read more: TCP Checksum
Acknowledgment
- Reduces network traffic by confirming multiple segments with a single acknowledgment.
Piggybacking
- Combines data and acknowledgment in one segment.
Selective Acknowledgements (SACK)
- Allows acknowledgment of specific segments, improving retransmission efficiency.
- Read more : RFC 2081, SACK
Retransmission
- This is the heart of TCP.
- Based on presumptions by the source.
- Ensures that lost or corrupted segments are resent.
Queue
- Organizes segments that need to be resent.
Time-out
- Triggers retransmission if no acknowledgment is received within a certain time.
Out-of-Order Segment
- Receiver didn’t receive some segment then receiver use sack to determine which packet that is lost, then sender resend the packet with the same sequence number as the sack.
Congestion Control
TCP also incorporates congestion control mechanisms to avoid network congestion, which can lead to packet loss and degraded performance. Congestion control algorithms, such as Slow Start and Congestion Avoidance dynamically adjust the rate of data transmission based on the perceived network conditions. By detecting signs of congestion, TCP can slow down or speed up the transmission rate to maintain optimal performance without overwhelming the network. The keys of Congestion Control are Detection and Avoidance Congestion. Read more : RFC 2581
Congestion Window (CWND)
- Adjusts the amount of data sent based on network conditions.
- Works with the send window to control the rate of data transmission.
- Before, we use receiver window (RWND) to determine send window (SWND), but now we use CWND.
- The formula is SWND = min(RWND, CWND)
Rules
- Ensures that data flow is adjusted based on both the sender’s and network’s capacity.
- Uses the minimum of Send Window and Congestion Window to determine the actual amount of data to send.
Slow Start
- Prevents initial congestion by gradually increasing the data sent.
- To avoid overloading the network.
- CWND gradually increase for example: 1, 2, 4, 8, …
- Read more : RFC 2581
Avoidance Congestion
- Minimizes congestion by adjusting the data flow before the network becomes overloaded.
- Uses a threshold to transition from Slow Start to a more stable data flow.
Threshold
- Defines the point at which the data transmission strategy changes.
Detection
- Identifies when congestion occurs to adjust the transmission rate.
Keepalive Timer
- Ensures that idle connections remain open.
Window Scale Option
- Allows for larger window sizes, supporting high-speed networks.
- Read more : RFC 1323
What’s next?
Networking is hard. Here are some interesting security research about TCP:
- Listen to the whispers: web timing attacks that actually work
- Beyond the Limit: Expanding single-packet race condition with a first sequence sync for breaking the 65,535 byte limit
- Exploiting Sequence Number Leakage: TCP Hijacking in NAT-Enabled Wi-Fi Networks
- PMTUD is not Panacea: Revisiting IP Fragmentation Attacks against TCP
- An Internet-wide Penetration Study on NAT Boxes via TCP/IP Side Channel
- TCP-Fuzz: Detecting Memory and Semantic Bugs in TCP Stacks with Fuzzing
- How TCP/IP Stacks Breed Critical Vulnerabilities in IoT, OT and IT Devices
Resource
- https://en.wikipedia.org/wiki/Transmission_Control_Protocol
- https://sookocheff.com/post/networking/how-does-tcp-work/
- https://hpbn.co/building-blocks-of-tcp/
- https://www.amazon.com/TCP-Illustrated-Vol-Addison-Wesley-Professional/dp/0201633469
- https://www.amazon.com/TCP-Guide-Comprehensive-Illustrated-Protocols/dp/159327047X
- https://tools.ietf.org/html/rfc793