|
@@ -1,101 +0,0 @@
|
|
|
-TCP protocol
|
|
|
-============
|
|
|
-
|
|
|
-Last updated: 3 June 2017
|
|
|
-
|
|
|
-Contents
|
|
|
-========
|
|
|
-
|
|
|
-- Congestion control
|
|
|
-- How the new TCP output machine [nyi] works
|
|
|
-
|
|
|
-Congestion control
|
|
|
-==================
|
|
|
-
|
|
|
-The following variables are used in the tcp_sock for congestion control:
|
|
|
-snd_cwnd The size of the congestion window
|
|
|
-snd_ssthresh Slow start threshold. We are in slow start if
|
|
|
- snd_cwnd is less than this.
|
|
|
-snd_cwnd_cnt A counter used to slow down the rate of increase
|
|
|
- once we exceed slow start threshold.
|
|
|
-snd_cwnd_clamp This is the maximum size that snd_cwnd can grow to.
|
|
|
-snd_cwnd_stamp Timestamp for when congestion window last validated.
|
|
|
-snd_cwnd_used Used as a highwater mark for how much of the
|
|
|
- congestion window is in use. It is used to adjust
|
|
|
- snd_cwnd down when the link is limited by the
|
|
|
- application rather than the network.
|
|
|
-
|
|
|
-As of 2.6.13, Linux supports pluggable congestion control algorithms.
|
|
|
-A congestion control mechanism can be registered through functions in
|
|
|
-tcp_cong.c. The functions used by the congestion control mechanism are
|
|
|
-registered via passing a tcp_congestion_ops struct to
|
|
|
-tcp_register_congestion_control. As a minimum, the congestion control
|
|
|
-mechanism must provide a valid name and must implement either ssthresh,
|
|
|
-cong_avoid and undo_cwnd hooks or the "omnipotent" cong_control hook.
|
|
|
-
|
|
|
-Private data for a congestion control mechanism is stored in tp->ca_priv.
|
|
|
-tcp_ca(tp) returns a pointer to this space. This is preallocated space - it
|
|
|
-is important to check the size of your private data will fit this space, or
|
|
|
-alternatively, space could be allocated elsewhere and a pointer to it could
|
|
|
-be stored here.
|
|
|
-
|
|
|
-There are three kinds of congestion control algorithms currently: The
|
|
|
-simplest ones are derived from TCP reno (highspeed, scalable) and just
|
|
|
-provide an alternative congestion window calculation. More complex
|
|
|
-ones like BIC try to look at other events to provide better
|
|
|
-heuristics. There are also round trip time based algorithms like
|
|
|
-Vegas and Westwood+.
|
|
|
-
|
|
|
-Good TCP congestion control is a complex problem because the algorithm
|
|
|
-needs to maintain fairness and performance. Please review current
|
|
|
-research and RFC's before developing new modules.
|
|
|
-
|
|
|
-The default congestion control mechanism is chosen based on the
|
|
|
-DEFAULT_TCP_CONG Kconfig parameter. If you really want a particular default
|
|
|
-value then you can set it using sysctl net.ipv4.tcp_congestion_control. The
|
|
|
-module will be autoloaded if needed and you will get the expected protocol. If
|
|
|
-you ask for an unknown congestion method, then the sysctl attempt will fail.
|
|
|
-
|
|
|
-If you remove a TCP congestion control module, then you will get the next
|
|
|
-available one. Since reno cannot be built as a module, and cannot be
|
|
|
-removed, it will always be available.
|
|
|
-
|
|
|
-How the new TCP output machine [nyi] works.
|
|
|
-===========================================
|
|
|
-
|
|
|
-Data is kept on a single queue. The skb->users flag tells us if the frame is
|
|
|
-one that has been queued already. To add a frame we throw it on the end. Ack
|
|
|
-walks down the list from the start.
|
|
|
-
|
|
|
-We keep a set of control flags
|
|
|
-
|
|
|
-
|
|
|
- sk->tcp_pend_event
|
|
|
-
|
|
|
- TCP_PEND_ACK Ack needed
|
|
|
- TCP_ACK_NOW Needed now
|
|
|
- TCP_WINDOW Window update check
|
|
|
- TCP_WINZERO Zero probing
|
|
|
-
|
|
|
-
|
|
|
- sk->transmit_queue The transmission frame begin
|
|
|
- sk->transmit_new First new frame pointer
|
|
|
- sk->transmit_end Where to add frames
|
|
|
-
|
|
|
- sk->tcp_last_tx_ack Last ack seen
|
|
|
- sk->tcp_dup_ack Dup ack count for fast retransmit
|
|
|
-
|
|
|
-
|
|
|
-Frames are queued for output by tcp_write. We do our best to send the frames
|
|
|
-off immediately if possible, but otherwise queue and compute the body
|
|
|
-checksum in the copy.
|
|
|
-
|
|
|
-When a write is done we try to clear any pending events and piggy back them.
|
|
|
-If the window is full we queue full sized frames. On the first timeout in
|
|
|
-zero window we split this.
|
|
|
-
|
|
|
-On a timer we walk the retransmit list to send any retransmits, update the
|
|
|
-backoff timers etc. A change of route table stamp causes a change of header
|
|
|
-and recompute. We add any new tcp level headers and refinish the checksum
|
|
|
-before sending.
|
|
|
-
|