7. TCP 최양희서울대학교컴퓨터공학부 1
TCP Basics Connection-oriented (virtual circuit) Reliable Transfer Buffered Transfer Unstructured Stream Full Duplex Point-to-point Connection End-to-end service 2004 Yanghee Choi 2
TCP Mechanisms Addressing: application to application addressing Reliable delivery: the receiver application should receive the same data stream the source puts on the network Segment order maintenance: data segments should reach the application in the same order they left the sender Flow control: the data sending speed should adapt itself to the receiver s speed Congestion control: the transmission speed can not be faster than the speed of the slowest link traversed on the connection path Segmentation: data is sent in segments that provide the highest throughput 2004 Yanghee Choi 3
Reliable Transmission Sender Network Message Receiver Send Packet 1 Receive ACK 1 Send Packet 2 Receive ACK 2 Receive Packet 1 Send ACK 1 Receive Packet 2 Send ACK 2 2004 Yanghee Choi 4
Timeout and Retransmission Sender Send Packet 1 Network Message Packet lost Receiver Start Timer Timer Expires Retransmit Packet 1 Start Timer Receive Packet 1 Send ACK 1 Receive ACK 1 Cancel Timer 2004 Yanghee Choi 5
Adaptive Retransmission estimation 1 estimation 1 estimation 2 estimation 2 Timeout Packet lost Timeout Packet lost 2004 Yanghee Choi 6
Sliding Window initial window Sender Network Message Receiver Send Packet 1 1 2 3 4 5 6 7 8 9 10... Send Packet 2 Receive packet 1 Send ACK 1 Window slides Send Packet 3 Receive ACK 1 Receive Packet 2 Send ACK 2 Receive Packet 3 1 2 3 4 5 6 7 8 9 10 Receive ACK 2 Send ACK 3 Receive ACK 3... 2004 Yanghee Choi 7
Transmission Control Protocol TCP is connection oriented and full duplex The maximum segment size(mss) is set during connection establishment Reliability is achieved using acknowledgments, round trip delay estimations and data retransmission TCP uses a variable window mechanism for flow control Congestion control and avoidance is reached using slow start and congestion avoidance schemes 2004 Yanghee Choi 8
Conceptual layering of UDP and TCP Ports, Connections, and Application Reliable Stream (TCP)User Datagram(UDP) Ports Internet (IP) Network Interface Endpoints: (host, port) e.g., (147.46.114.112, 21) Endpoints Connections: are identified by a pair of endpoints e.g., (147.46.114.112, 21) and (147.46.114.128, 1500) TCP uses the connection, not the protocol port, as its fundamental abstraction Because TCP identifies a connection by a pair of endpoints, a given TCP port number can be shared by multiple connections on the same machine Application can provide concurrent service to multiple connections simultaneously without needing unique local port for each connection 2004 Yanghee Choi 9
Flow Control in TCP TCP views the data stream as a sequence of octets that it divides into segments for transmission TCP uses a sliding window mechanism to adjust the sender s transmission speed to that of the receiver The sliding window permits the sending of multiple segments before waiting for an ACK -> efficient transmission ACK segments indicate the last correctly received byte and the number of bytes the receiver is still willing to accept A sender keeps three pointers associated with every connection 1 2 3 4 5 6 7 8 9 10 11... current window 2004 Yanghee Choi 10
Flow Control in TCP (cont.) TCP allows the window size to vary over time ACK contains a window advertisement that specifies how many additional octets of data the receiver is prepared to accept (receiver s buffer size) In response to an increased(decreased) window advertisement, the sender increases(decreases) the size of its sliding window Variable size window provides flow control as well as reliable transfer Flow control mechanism is essential in Internet environment, where machines of various speeds and sizes communicate through networks and routers of various speed and capacities End-to-end flow control: sliding window scheme Congestion control: no explicit mechanism, implementation dependent 2004 Yanghee Choi 11
TCP Segment Format 0 4 10 16 24 31 SOURCE PORT DESTINATION PORT SEQUENCE NUMBER ACKNOWLEDGEMENT NUMBER HLEN RESERVED CODE BITS WINDOW CHECKSUM OPTIONS (IF ANY) URGENT POINTER PADDING DATA... 2004 Yanghee Choi 12
TCP Segment Format (cont.) Segments are exchanged to establish connections transfer data send ACK advertise window close connections CODE BITS: determines the purpose and contents of the segment Bit(left to right) Meaning if bit set to 1 URG Urgent pointer field is valid ACK Acknowledgement field is valid PSH This segment requests a push RST Reset the connection SYN Synchronize sequence numbers FIN Sender has reached end of its byte stream 2004 Yanghee Choi 13
Out of Band Data It is important for the program at one end of a connection to send data out of band, without for the program at the other end of the connection to consume octets already in the stream e.g., In a remote login session, interrupt or abort keyboard sequence TCP allows the sender to specify data as urgent, meaning that the receiving program should be notified of its arrival as quickly as possible, regardless of its position in the stream Urgent mode vs. normal mode When the URG code bit is set, the Urgent Pointer specifies the position in the segment where urgent data ends 2004 Yanghee Choi 14
Maximum Segment Size(MSS) Option Most common option in TCP segment To support heterogeneous buffer capacities To make good use of the bandwidth in high speed LAN. MSS == minimum MTU In general internet environment, choosing a good MSS can be difficult because performance can be poor for either extremely large segment sizes or extremely small sizes Extremely small MSS: makes network utilization low Extremely large MSS: decreases throughput because of fragmentation Optimum MSS occurs when the IP datagrams carrying the segments are as large as possible without requiring fragmentation anywhere along the path from the source to the destination. => But, difficult problem for several reasons Default MSS(536 bytes) = default size of IP datagram(576 bytes) - 40 2004 Yanghee Choi 15
TCP Checksum Computation 16-bit integer checksum used to verify the integrity of the data as well as the TCP header TCP prepends a pseudo header to the segment, appends enough zero bits to make the segment a multiple of 16 bits, and computes the 16-bit checksum over the entire result TCP does not count the pseudo header or padding in the segment length, nor does it transmit them Pseudo header allows the receiver to verify that the segment has reached its correct destination At the receiver, the IP must pass to TCP the source and destination IP addresses from the datagram as well as the segment itself Pseudo header SOURCE IP ADDRESS 0 8 16 31 DESTINATION IP ADDRESS 2004 Yanghee Choi ZERO PROTOCOL TCP LENGTH 16
Acknowledgments, Timeout, and Retransmission A TCP receiver always acknowledges the last correctly received byte -> cumulative ACK After sending a segment the sender starts a timer If the timer expires before receiving an ACK for the sent segment, the segment is considered lost and must be retransmitted In an internet environment, it is impossible to know a priori how quickly ACKs will return to the source The timeout value is calculated dynamically according to the measured round trip time(rtt) - adaptive retransmission algorithm Estimated round trip time (RTT) RTT = ( α* Old _ RTT ) + (( 1 α )* New _ Round _ Trip_ Sample ), 0 α < 1 Timeout value Timeout = β* RTT, β 1 2004 Yanghee Choi 17
Acknowledgment ambiguity Round Trip Time Measurement Because both datagrams carry exactly the same data, the sender has no way of knowing whether an ACK corresponds to the original or retransmitted datagram. The original transmission and the most recent transmission both fail to provide accurate round trip time t1 t2 t3 timeout retransmit ACK Round_Trip_Sample = t3 - t2 or t3 - t1? 2004 Yanghee Choi 18
Modified Algorithm for RTT Karn s Algorithm When computing the round trip estimate, ignore samples that correspond to retransmitted segments, but use a backoff strategy, and retain the timeout value from a retransmitted packet for subsequent packets until a valid sample is obtained Timer backoff strategy: If the timer expires and causes a retransmission, TCP increases the timeout new _ timeout = γ* timeout, typically, γ = 2 When an internet misbehaves, Karn s algorithm separates computation of the timeout value from the current round trip estimate 2004 Yanghee Choi 19
Responding to High Delay Variance To adapt to a wide range of variation in delay. Queueing theory suggests that the variation in RTT, σ, varies proportional to 1/(1-L), where L is the current network load. 0 L 1 The 1989 spec for TCP requires implementations to estimate both the average round trip time and the variance, and to use the estimated variance in place of the constant β DIFF = SAMPLE Old _ RTT Smoothed _ RTT = Old _ RTT + δ* DIFF DEV = Old _ DEV + ρ( DIFF Old _ DEV ) Timeout = Smoothed _ RTT + η* DEV DEV: the estimated mean deviation δ : controls how quickly the new sample affects the weighted average ρ : controls how quickly the new sample affects the mean deviation : controls how much the deviation affects the round trip timeout η 2004 Yanghee Choi 20
Connection Establishment Three way handshake Event At Site 1 Network Message Event At Site 2 Send SYN seq=x Receive SYN + ACK segment Send ACK y+1 Receive SYN segment Send SYN seq=y, ACK x+1 Receive ACK segment A sends a SYN segment with an initial sequence number(isn) and the maximum segment size(mss) it is willing to receive B replies with a SYN segment acknowledging ISN and announcing its MSS MSS can be at most as large as the interface segment size minus 40 2004 Yanghee Choi 21
Event At Site 1 Network Message (application closes connection) Send FIN seq=x Receive ACK segment Receive FIN + ACK segment Send ACK y+1 Connection Termination Three way handshake Event At Site 2 Receive FIN segment Send ACK x+1 (inform application) (application closes connection) Send FIN seq=y, ACK x+1 Receive ACK segment A sender terminates its part of the connection by sending a FIN segment After acknowledging the FIN the receiver can still send data on its part of the connection(half close) A connection can be aborted with RST segment if the abnormal conditions arise 2004 Yanghee Choi 22
TCP State Machine anything / reset begin CLOSED syn / syn +ack passive open LISTEN close active open / syn send / syn SYN RECVD reset ack syn / syn +ack syn+ack / ack SYN SENT close / timeout / reset close / fin close / fin ESTAB- LISHED fin / ack close / fin CLOSE WAIT FIN WAIT-1 ack / FIN WAIT-2 fin / ack fin-ack / ack fin / ack CLOSING TIMED WAIT LAST ACK 2004 Yanghee Choi 23 ack / ack / timeout after 2 segment lifetimes
TCP State Machine (cont.) TCP states CLOSED No connection is active or pending LISTEN The server is waiting for an incoming call SYN RCVD A connection request has arrived; wait for ACK SYN SENT The application has started to open a conn ESTABLISHED The normal data transfer state FIN WAIT-1 The application has said it is finished FIND WAIT-2 The other side has agreed to release TIMED WAIT Wait for all packets to die off CLOSING Both sides has tried to close simultaneously CLOSE WAIT The other side has initiated a release LAST ACK Wait for all packets to die off 2004 Yanghee Choi 24
Reserved TCP Port Numbers Keyword UNIX keyword Description 0 Reserved 1 TCPMUX - TCP Multiplexer 5 RJE - Remote Job Entry 7 ECHO echo Echo 9 DISCARD discard Discard 11 USERS systat Active Users 13 DAYTIME daytime Daytime 15 - netstat Network status program 17 QUOTE qotd Quote of the day 19 CHARGEN chargen Character Generator 20 FTP-DATA ftp-data File Transfer Protocol 21 FTP ftp File Transfer Protocol 23 TELNET telnet Terminal Connection 25 SMTP smtp Simple Mail Transport Protocol 37 TIME time Time 42 NAMESERVER name Host Name Server 43 NICNAME whois Who Is 53 DOMAIN nameserver Domain Name Server 77 - rje any private RJE service 79 FINGER finger Finger 93 DCP - Device Control Protocol 95 SUPDUP supdup SUPDUP Protocol 2004 Yanghee Choi 25