|
@@ -19,6 +19,8 @@ to each other.
|
|
|
The <tt>rcu_state</tt> Structure</a>
|
|
|
<li> <a href="#The rcu_node Structure">
|
|
|
The <tt>rcu_node</tt> Structure</a>
|
|
|
+<li> <a href="#The rcu_segcblist Structure">
|
|
|
+ The <tt>rcu_segcblist</tt> Structure</a>
|
|
|
<li> <a href="#The rcu_data Structure">
|
|
|
The <tt>rcu_data</tt> Structure</a>
|
|
|
<li> <a href="#The rcu_dynticks Structure">
|
|
@@ -841,6 +843,134 @@ for lockdep lock-class names.
|
|
|
Finally, lines 64-66 produce an error if the maximum number of
|
|
|
CPUs is too large for the specified fanout.
|
|
|
|
|
|
+<h3><a name="The rcu_segcblist Structure">
|
|
|
+The <tt>rcu_segcblist</tt> Structure</a></h3>
|
|
|
+
|
|
|
+The <tt>rcu_segcblist</tt> structure maintains a segmented list of
|
|
|
+callbacks as follows:
|
|
|
+
|
|
|
+<pre>
|
|
|
+ 1 #define RCU_DONE_TAIL 0
|
|
|
+ 2 #define RCU_WAIT_TAIL 1
|
|
|
+ 3 #define RCU_NEXT_READY_TAIL 2
|
|
|
+ 4 #define RCU_NEXT_TAIL 3
|
|
|
+ 5 #define RCU_CBLIST_NSEGS 4
|
|
|
+ 6
|
|
|
+ 7 struct rcu_segcblist {
|
|
|
+ 8 struct rcu_head *head;
|
|
|
+ 9 struct rcu_head **tails[RCU_CBLIST_NSEGS];
|
|
|
+10 unsigned long gp_seq[RCU_CBLIST_NSEGS];
|
|
|
+11 long len;
|
|
|
+12 long len_lazy;
|
|
|
+13 };
|
|
|
+</pre>
|
|
|
+
|
|
|
+<p>
|
|
|
+The segments are as follows:
|
|
|
+
|
|
|
+<ol>
|
|
|
+<li> <tt>RCU_DONE_TAIL</tt>: Callbacks whose grace periods have elapsed.
|
|
|
+ These callbacks are ready to be invoked.
|
|
|
+<li> <tt>RCU_WAIT_TAIL</tt>: Callbacks that are waiting for the
|
|
|
+ current grace period.
|
|
|
+ Note that different CPUs can have different ideas about which
|
|
|
+ grace period is current, hence the <tt>->gp_seq</tt> field.
|
|
|
+<li> <tt>RCU_NEXT_READY_TAIL</tt>: Callbacks waiting for the next
|
|
|
+ grace period to start.
|
|
|
+<li> <tt>RCU_NEXT_TAIL</tt>: Callbacks that have not yet been
|
|
|
+ associated with a grace period.
|
|
|
+</ol>
|
|
|
+
|
|
|
+<p>
|
|
|
+The <tt>->head</tt> pointer references the first callback or
|
|
|
+is <tt>NULL</tt> if the list contains no callbacks (which is
|
|
|
+<i>not</i> the same as being empty).
|
|
|
+Each element of the <tt>->tails[]</tt> array references the
|
|
|
+<tt>->next</tt> pointer of the last callback in the corresponding
|
|
|
+segment of the list, or the list's <tt>->head</tt> pointer if
|
|
|
+that segment and all previous segments are empty.
|
|
|
+If the corresponding segment is empty but some previous segment is
|
|
|
+not empty, then the array element is identical to its predecessor.
|
|
|
+Older callbacks are closer to the head of the list, and new callbacks
|
|
|
+are added at the tail.
|
|
|
+This relationship between the <tt>->head</tt> pointer, the
|
|
|
+<tt>->tails[]</tt> array, and the callbacks is shown in this
|
|
|
+diagram:
|
|
|
+
|
|
|
+</p><p><img src="nxtlist.svg" alt="nxtlist.svg" width="40%">
|
|
|
+
|
|
|
+</p><p>In this figure, the <tt>->head</tt> pointer references the
|
|
|
+first
|
|
|
+RCU callback in the list.
|
|
|
+The <tt>->tails[RCU_DONE_TAIL]</tt> array element references
|
|
|
+the <tt>->head</tt> pointer itself, indicating that none
|
|
|
+of the callbacks is ready to invoke.
|
|
|
+The <tt>->tails[RCU_WAIT_TAIL]</tt> array element references callback
|
|
|
+CB 2's <tt>->next</tt> pointer, which indicates that
|
|
|
+CB 1 and CB 2 are both waiting on the current grace period,
|
|
|
+give or take possible disagreements about exactly which grace period
|
|
|
+is the current one.
|
|
|
+The <tt>->tails[RCU_NEXT_READY_TAIL]</tt> array element
|
|
|
+references the same RCU callback that <tt>->tails[RCU_WAIT_TAIL]</tt>
|
|
|
+does, which indicates that there are no callbacks waiting on the next
|
|
|
+RCU grace period.
|
|
|
+The <tt>->tails[RCU_NEXT_TAIL]</tt> array element references
|
|
|
+CB 4's <tt>->next</tt> pointer, indicating that all the
|
|
|
+remaining RCU callbacks have not yet been assigned to an RCU grace
|
|
|
+period.
|
|
|
+Note that the <tt>->tails[RCU_NEXT_TAIL]</tt> array element
|
|
|
+always references the last RCU callback's <tt>->next</tt> pointer
|
|
|
+unless the callback list is empty, in which case it references
|
|
|
+the <tt>->head</tt> pointer.
|
|
|
+
|
|
|
+<p>
|
|
|
+There is one additional important special case for the
|
|
|
+<tt>->tails[RCU_NEXT_TAIL]</tt> array element: It can be <tt>NULL</tt>
|
|
|
+when this list is <i>disabled</i>.
|
|
|
+Lists are disabled when the corresponding CPU is offline or when
|
|
|
+the corresponding CPU's callbacks are offloaded to a kthread,
|
|
|
+both of which are described elsewhere.
|
|
|
+
|
|
|
+</p><p>CPUs advance their callbacks from the
|
|
|
+<tt>RCU_NEXT_TAIL</tt> to the <tt>RCU_NEXT_READY_TAIL</tt> to the
|
|
|
+<tt>RCU_WAIT_TAIL</tt> to the <tt>RCU_DONE_TAIL</tt> list segments
|
|
|
+as grace periods advance.
|
|
|
+
|
|
|
+</p><p>The <tt>->gp_seq[]</tt> array records grace-period
|
|
|
+numbers corresponding to the list segments.
|
|
|
+This is what allows different CPUs to have different ideas as to
|
|
|
+which is the current grace period while still avoiding premature
|
|
|
+invocation of their callbacks.
|
|
|
+In particular, this allows CPUs that go idle for extended periods
|
|
|
+to determine which of their callbacks are ready to be invoked after
|
|
|
+reawakening.
|
|
|
+
|
|
|
+</p><p>The <tt>->len</tt> counter contains the number of
|
|
|
+callbacks in <tt>->head</tt>, and the
|
|
|
+<tt>->len_lazy</tt> contains the number of those callbacks that
|
|
|
+are known to only free memory, and whose invocation can therefore
|
|
|
+be safely deferred.
|
|
|
+
|
|
|
+<p><b>Important note</b>: It is the <tt>->len</tt> field that
|
|
|
+determines whether or not there are callbacks associated with
|
|
|
+this <tt>rcu_segcblist</tt> structure, <i>not</i> the <tt>->head</tt>
|
|
|
+pointer.
|
|
|
+The reason for this is that all the ready-to-invoke callbacks
|
|
|
+(that is, those in the <tt>RCU_DONE_TAIL</tt> segment) are extracted
|
|
|
+all at once at callback-invocation time.
|
|
|
+If callback invocation must be postponed, for example, because a
|
|
|
+high-priority process just woke up on this CPU, then the remaining
|
|
|
+callbacks are placed back on the <tt>RCU_DONE_TAIL</tt> segment.
|
|
|
+Either way, the <tt>->len</tt> and <tt>->len_lazy</tt> counts
|
|
|
+are adjusted after the corresponding callbacks have been invoked, and so
|
|
|
+again it is the <tt>->len</tt> count that accurately reflects whether
|
|
|
+or not there are callbacks associated with this <tt>rcu_segcblist</tt>
|
|
|
+structure.
|
|
|
+Of course, off-CPU sampling of the <tt>->len</tt> count requires
|
|
|
+the use of appropriate synchronization, for example, memory barriers.
|
|
|
+This synchronization can be a bit subtle, particularly in the case
|
|
|
+of <tt>rcu_barrier()</tt>.
|
|
|
+
|
|
|
<h3><a name="The rcu_data Structure">
|
|
|
The <tt>rcu_data</tt> Structure</a></h3>
|
|
|
|
|
@@ -983,62 +1113,18 @@ choice.
|
|
|
as follows:
|
|
|
|
|
|
<pre>
|
|
|
- 1 struct rcu_head *nxtlist;
|
|
|
- 2 struct rcu_head **nxttail[RCU_NEXT_SIZE];
|
|
|
- 3 unsigned long nxtcompleted[RCU_NEXT_SIZE];
|
|
|
- 4 long qlen_lazy;
|
|
|
- 5 long qlen;
|
|
|
- 6 long qlen_last_fqs_check;
|
|
|
+ 1 struct rcu_segcblist cblist;
|
|
|
+ 2 long qlen_last_fqs_check;
|
|
|
+ 3 unsigned long n_cbs_invoked;
|
|
|
+ 4 unsigned long n_nocbs_invoked;
|
|
|
+ 5 unsigned long n_cbs_orphaned;
|
|
|
+ 6 unsigned long n_cbs_adopted;
|
|
|
7 unsigned long n_force_qs_snap;
|
|
|
- 8 unsigned long n_cbs_invoked;
|
|
|
- 9 unsigned long n_cbs_orphaned;
|
|
|
-10 unsigned long n_cbs_adopted;
|
|
|
-11 long blimit;
|
|
|
+ 8 long blimit;
|
|
|
</pre>
|
|
|
|
|
|
-<p>The <tt>->nxtlist</tt> pointer and the
|
|
|
-<tt>->nxttail[]</tt> array form a four-segment list with
|
|
|
-older callbacks near the head and newer ones near the tail.
|
|
|
-Each segment contains callbacks with the corresponding relationship
|
|
|
-to the current grace period.
|
|
|
-The pointer out of the end of each of the four segments is referenced
|
|
|
-by the element of the <tt>->nxttail[]</tt> array indexed by
|
|
|
-<tt>RCU_DONE_TAIL</tt> (for callbacks handled by a prior grace period),
|
|
|
-<tt>RCU_WAIT_TAIL</tt> (for callbacks waiting on the current grace period),
|
|
|
-<tt>RCU_NEXT_READY_TAIL</tt> (for callbacks that will wait on the next
|
|
|
-grace period), and
|
|
|
-<tt>RCU_NEXT_TAIL</tt> (for callbacks that are not yet associated
|
|
|
-with a specific grace period)
|
|
|
-respectively, as shown in the following figure.
|
|
|
-
|
|
|
-</p><p><img src="nxtlist.svg" alt="nxtlist.svg" width="40%">
|
|
|
-
|
|
|
-</p><p>In this figure, the <tt>->nxtlist</tt> pointer references the
|
|
|
-first
|
|
|
-RCU callback in the list.
|
|
|
-The <tt>->nxttail[RCU_DONE_TAIL]</tt> array element references
|
|
|
-the <tt>->nxtlist</tt> pointer itself, indicating that none
|
|
|
-of the callbacks is ready to invoke.
|
|
|
-The <tt>->nxttail[RCU_WAIT_TAIL]</tt> array element references callback
|
|
|
-CB 2's <tt>->next</tt> pointer, which indicates that
|
|
|
-CB 1 and CB 2 are both waiting on the current grace period.
|
|
|
-The <tt>->nxttail[RCU_NEXT_READY_TAIL]</tt> array element
|
|
|
-references the same RCU callback that <tt>->nxttail[RCU_WAIT_TAIL]</tt>
|
|
|
-does, which indicates that there are no callbacks waiting on the next
|
|
|
-RCU grace period.
|
|
|
-The <tt>->nxttail[RCU_NEXT_TAIL]</tt> array element references
|
|
|
-CB 4's <tt>->next</tt> pointer, indicating that all the
|
|
|
-remaining RCU callbacks have not yet been assigned to an RCU grace
|
|
|
-period.
|
|
|
-Note that the <tt>->nxttail[RCU_NEXT_TAIL]</tt> array element
|
|
|
-always references the last RCU callback's <tt>->next</tt> pointer
|
|
|
-unless the callback list is empty, in which case it references
|
|
|
-the <tt>->nxtlist</tt> pointer.
|
|
|
-
|
|
|
-</p><p>CPUs advance their callbacks from the
|
|
|
-<tt>RCU_NEXT_TAIL</tt> to the <tt>RCU_NEXT_READY_TAIL</tt> to the
|
|
|
-<tt>RCU_WAIT_TAIL</tt> to the <tt>RCU_DONE_TAIL</tt> list segments
|
|
|
-as grace periods advance.
|
|
|
+<p>The <tt>->cblist</tt> structure is the segmented callback list
|
|
|
+described earlier.
|
|
|
The CPU advances the callbacks in its <tt>rcu_data</tt> structure
|
|
|
whenever it notices that another RCU grace period has completed.
|
|
|
The CPU detects the completion of an RCU grace period by noticing
|
|
@@ -1049,16 +1135,7 @@ Recall that each <tt>rcu_node</tt> structure's
|
|
|
<tt>->completed</tt> field is updated at the end of each
|
|
|
grace period.
|
|
|
|
|
|
-</p><p>The <tt>->nxtcompleted[]</tt> array records grace-period
|
|
|
-numbers corresponding to the list segments.
|
|
|
-This allows CPUs that go idle for extended periods to determine
|
|
|
-which of their callbacks are ready to be invoked after reawakening.
|
|
|
-
|
|
|
-</p><p>The <tt>->qlen</tt> counter contains the number of
|
|
|
-callbacks in <tt>->nxtlist</tt>, and the
|
|
|
-<tt>->qlen_lazy</tt> contains the number of those callbacks that
|
|
|
-are known to only free memory, and whose invocation can therefore
|
|
|
-be safely deferred.
|
|
|
+<p>
|
|
|
The <tt>->qlen_last_fqs_check</tt> and
|
|
|
<tt>->n_force_qs_snap</tt> coordinate the forcing of quiescent
|
|
|
states from <tt>call_rcu()</tt> and friends when callback
|
|
@@ -1069,6 +1146,10 @@ lists grow excessively long.
|
|
|
fields count the number of callbacks invoked,
|
|
|
sent to other CPUs when this CPU goes offline,
|
|
|
and received from other CPUs when those other CPUs go offline.
|
|
|
+The <tt>->n_nocbs_invoked</tt> is used when the CPU's callbacks
|
|
|
+are offloaded to a kthread.
|
|
|
+
|
|
|
+<p>
|
|
|
Finally, the <tt>->blimit</tt> counter is the maximum number of
|
|
|
RCU callbacks that may be invoked at a given time.
|
|
|
|