|
@@ -23,6 +23,14 @@ over a rather long period of time, but improvements are always welcome!
|
|
Yet another exception is where the low real-time latency of RCU's
|
|
Yet another exception is where the low real-time latency of RCU's
|
|
read-side primitives is critically important.
|
|
read-side primitives is critically important.
|
|
|
|
|
|
|
|
+ One final exception is where RCU readers are used to prevent
|
|
|
|
+ the ABA problem (https://en.wikipedia.org/wiki/ABA_problem)
|
|
|
|
+ for lockless updates. This does result in the mildly
|
|
|
|
+ counter-intuitive situation where rcu_read_lock() and
|
|
|
|
+ rcu_read_unlock() are used to protect updates, however, this
|
|
|
|
+ approach provides the same potential simplifications that garbage
|
|
|
|
+ collectors do.
|
|
|
|
+
|
|
1. Does the update code have proper mutual exclusion?
|
|
1. Does the update code have proper mutual exclusion?
|
|
|
|
|
|
RCU does allow -readers- to run (almost) naked, but -writers- must
|
|
RCU does allow -readers- to run (almost) naked, but -writers- must
|
|
@@ -40,7 +48,9 @@ over a rather long period of time, but improvements are always welcome!
|
|
explain how this single task does not become a major bottleneck on
|
|
explain how this single task does not become a major bottleneck on
|
|
big multiprocessor machines (for example, if the task is updating
|
|
big multiprocessor machines (for example, if the task is updating
|
|
information relating to itself that other tasks can read, there
|
|
information relating to itself that other tasks can read, there
|
|
- by definition can be no bottleneck).
|
|
|
|
|
|
+ by definition can be no bottleneck). Note that the definition
|
|
|
|
+ of "large" has changed significantly: Eight CPUs was "large"
|
|
|
|
+ in the year 2000, but a hundred CPUs was unremarkable in 2017.
|
|
|
|
|
|
2. Do the RCU read-side critical sections make proper use of
|
|
2. Do the RCU read-side critical sections make proper use of
|
|
rcu_read_lock() and friends? These primitives are needed
|
|
rcu_read_lock() and friends? These primitives are needed
|
|
@@ -55,6 +65,12 @@ over a rather long period of time, but improvements are always welcome!
|
|
Disabling of preemption can serve as rcu_read_lock_sched(), but
|
|
Disabling of preemption can serve as rcu_read_lock_sched(), but
|
|
is less readable.
|
|
is less readable.
|
|
|
|
|
|
|
|
+ Letting RCU-protected pointers "leak" out of an RCU read-side
|
|
|
|
+ critical section is every bid as bad as letting them leak out
|
|
|
|
+ from under a lock. Unless, of course, you have arranged some
|
|
|
|
+ other means of protection, such as a lock or a reference count
|
|
|
|
+ -before- letting them out of the RCU read-side critical section.
|
|
|
|
+
|
|
3. Does the update code tolerate concurrent accesses?
|
|
3. Does the update code tolerate concurrent accesses?
|
|
|
|
|
|
The whole point of RCU is to permit readers to run without
|
|
The whole point of RCU is to permit readers to run without
|
|
@@ -78,10 +94,10 @@ over a rather long period of time, but improvements are always welcome!
|
|
|
|
|
|
This works quite well, also.
|
|
This works quite well, also.
|
|
|
|
|
|
- c. Make updates appear atomic to readers. For example,
|
|
|
|
|
|
+ c. Make updates appear atomic to readers. For example,
|
|
pointer updates to properly aligned fields will
|
|
pointer updates to properly aligned fields will
|
|
appear atomic, as will individual atomic primitives.
|
|
appear atomic, as will individual atomic primitives.
|
|
- Sequences of perations performed under a lock will -not-
|
|
|
|
|
|
+ Sequences of operations performed under a lock will -not-
|
|
appear to be atomic to RCU readers, nor will sequences
|
|
appear to be atomic to RCU readers, nor will sequences
|
|
of multiple atomic primitives.
|
|
of multiple atomic primitives.
|
|
|
|
|
|
@@ -168,8 +184,8 @@ over a rather long period of time, but improvements are always welcome!
|
|
|
|
|
|
5. If call_rcu(), or a related primitive such as call_rcu_bh(),
|
|
5. If call_rcu(), or a related primitive such as call_rcu_bh(),
|
|
call_rcu_sched(), or call_srcu() is used, the callback function
|
|
call_rcu_sched(), or call_srcu() is used, the callback function
|
|
- must be written to be called from softirq context. In particular,
|
|
|
|
- it cannot block.
|
|
|
|
|
|
+ will be called from softirq context. In particular, it cannot
|
|
|
|
+ block.
|
|
|
|
|
|
6. Since synchronize_rcu() can block, it cannot be called from
|
|
6. Since synchronize_rcu() can block, it cannot be called from
|
|
any sort of irq context. The same rule applies for
|
|
any sort of irq context. The same rule applies for
|
|
@@ -178,11 +194,14 @@ over a rather long period of time, but improvements are always welcome!
|
|
synchronize_sched_expedite(), and synchronize_srcu_expedited().
|
|
synchronize_sched_expedite(), and synchronize_srcu_expedited().
|
|
|
|
|
|
The expedited forms of these primitives have the same semantics
|
|
The expedited forms of these primitives have the same semantics
|
|
- as the non-expedited forms, but expediting is both expensive
|
|
|
|
- and unfriendly to real-time workloads. Use of the expedited
|
|
|
|
- primitives should be restricted to rare configuration-change
|
|
|
|
- operations that would not normally be undertaken while a real-time
|
|
|
|
- workload is running.
|
|
|
|
|
|
+ as the non-expedited forms, but expediting is both expensive and
|
|
|
|
+ (with the exception of synchronize_srcu_expedited()) unfriendly
|
|
|
|
+ to real-time workloads. Use of the expedited primitives should
|
|
|
|
+ be restricted to rare configuration-change operations that would
|
|
|
|
+ not normally be undertaken while a real-time workload is running.
|
|
|
|
+ However, real-time workloads can use rcupdate.rcu_normal kernel
|
|
|
|
+ boot parameter to completely disable expedited grace periods,
|
|
|
|
+ though this might have performance implications.
|
|
|
|
|
|
In particular, if you find yourself invoking one of the expedited
|
|
In particular, if you find yourself invoking one of the expedited
|
|
primitives repeatedly in a loop, please do everyone a favor:
|
|
primitives repeatedly in a loop, please do everyone a favor:
|
|
@@ -193,11 +212,6 @@ over a rather long period of time, but improvements are always welcome!
|
|
of the system, especially to real-time workloads running on
|
|
of the system, especially to real-time workloads running on
|
|
the rest of the system.
|
|
the rest of the system.
|
|
|
|
|
|
- In addition, it is illegal to call the expedited forms from
|
|
|
|
- a CPU-hotplug notifier, or while holding a lock that is acquired
|
|
|
|
- by a CPU-hotplug notifier. Failing to observe this restriction
|
|
|
|
- will result in deadlock.
|
|
|
|
-
|
|
|
|
7. If the updater uses call_rcu() or synchronize_rcu(), then the
|
|
7. If the updater uses call_rcu() or synchronize_rcu(), then the
|
|
corresponding readers must use rcu_read_lock() and
|
|
corresponding readers must use rcu_read_lock() and
|
|
rcu_read_unlock(). If the updater uses call_rcu_bh() or
|
|
rcu_read_unlock(). If the updater uses call_rcu_bh() or
|
|
@@ -321,7 +335,7 @@ over a rather long period of time, but improvements are always welcome!
|
|
Similarly, disabling preemption is not an acceptable substitute
|
|
Similarly, disabling preemption is not an acceptable substitute
|
|
for rcu_read_lock(). Code that attempts to use preemption
|
|
for rcu_read_lock(). Code that attempts to use preemption
|
|
disabling where it should be using rcu_read_lock() will break
|
|
disabling where it should be using rcu_read_lock() will break
|
|
- in real-time kernel builds.
|
|
|
|
|
|
+ in CONFIG_PREEMPT=y kernel builds.
|
|
|
|
|
|
If you want to wait for interrupt handlers, NMI handlers, and
|
|
If you want to wait for interrupt handlers, NMI handlers, and
|
|
code under the influence of preempt_disable(), you instead
|
|
code under the influence of preempt_disable(), you instead
|
|
@@ -356,23 +370,22 @@ over a rather long period of time, but improvements are always welcome!
|
|
not the case, a self-spawning RCU callback would prevent the
|
|
not the case, a self-spawning RCU callback would prevent the
|
|
victim CPU from ever going offline.)
|
|
victim CPU from ever going offline.)
|
|
|
|
|
|
-14. SRCU (srcu_read_lock(), srcu_read_unlock(), srcu_dereference(),
|
|
|
|
- synchronize_srcu(), synchronize_srcu_expedited(), and call_srcu())
|
|
|
|
- may only be invoked from process context. Unlike other forms of
|
|
|
|
- RCU, it -is- permissible to block in an SRCU read-side critical
|
|
|
|
- section (demarked by srcu_read_lock() and srcu_read_unlock()),
|
|
|
|
- hence the "SRCU": "sleepable RCU". Please note that if you
|
|
|
|
- don't need to sleep in read-side critical sections, you should be
|
|
|
|
- using RCU rather than SRCU, because RCU is almost always faster
|
|
|
|
- and easier to use than is SRCU.
|
|
|
|
-
|
|
|
|
- Also unlike other forms of RCU, explicit initialization
|
|
|
|
- and cleanup is required via init_srcu_struct() and
|
|
|
|
- cleanup_srcu_struct(). These are passed a "struct srcu_struct"
|
|
|
|
- that defines the scope of a given SRCU domain. Once initialized,
|
|
|
|
- the srcu_struct is passed to srcu_read_lock(), srcu_read_unlock()
|
|
|
|
- synchronize_srcu(), synchronize_srcu_expedited(), and call_srcu().
|
|
|
|
- A given synchronize_srcu() waits only for SRCU read-side critical
|
|
|
|
|
|
+14. Unlike other forms of RCU, it -is- permissible to block in an
|
|
|
|
+ SRCU read-side critical section (demarked by srcu_read_lock()
|
|
|
|
+ and srcu_read_unlock()), hence the "SRCU": "sleepable RCU".
|
|
|
|
+ Please note that if you don't need to sleep in read-side critical
|
|
|
|
+ sections, you should be using RCU rather than SRCU, because RCU
|
|
|
|
+ is almost always faster and easier to use than is SRCU.
|
|
|
|
+
|
|
|
|
+ Also unlike other forms of RCU, explicit initialization and
|
|
|
|
+ cleanup is required either at build time via DEFINE_SRCU()
|
|
|
|
+ or DEFINE_STATIC_SRCU() or at runtime via init_srcu_struct()
|
|
|
|
+ and cleanup_srcu_struct(). These last two are passed a
|
|
|
|
+ "struct srcu_struct" that defines the scope of a given
|
|
|
|
+ SRCU domain. Once initialized, the srcu_struct is passed
|
|
|
|
+ to srcu_read_lock(), srcu_read_unlock() synchronize_srcu(),
|
|
|
|
+ synchronize_srcu_expedited(), and call_srcu(). A given
|
|
|
|
+ synchronize_srcu() waits only for SRCU read-side critical
|
|
sections governed by srcu_read_lock() and srcu_read_unlock()
|
|
sections governed by srcu_read_lock() and srcu_read_unlock()
|
|
calls that have been passed the same srcu_struct. This property
|
|
calls that have been passed the same srcu_struct. This property
|
|
is what makes sleeping read-side critical sections tolerable --
|
|
is what makes sleeping read-side critical sections tolerable --
|
|
@@ -390,10 +403,16 @@ over a rather long period of time, but improvements are always welcome!
|
|
Therefore, SRCU should be used in preference to rw_semaphore
|
|
Therefore, SRCU should be used in preference to rw_semaphore
|
|
only in extremely read-intensive situations, or in situations
|
|
only in extremely read-intensive situations, or in situations
|
|
requiring SRCU's read-side deadlock immunity or low read-side
|
|
requiring SRCU's read-side deadlock immunity or low read-side
|
|
- realtime latency.
|
|
|
|
|
|
+ realtime latency. You should also consider percpu_rw_semaphore
|
|
|
|
+ when you need lightweight readers.
|
|
|
|
|
|
- Note that, rcu_assign_pointer() relates to SRCU just as it does
|
|
|
|
- to other forms of RCU.
|
|
|
|
|
|
+ SRCU's expedited primitive (synchronize_srcu_expedited())
|
|
|
|
+ never sends IPIs to other CPUs, so it is easier on
|
|
|
|
+ real-time workloads than is synchronize_rcu_expedited(),
|
|
|
|
+ synchronize_rcu_bh_expedited() or synchronize_sched_expedited().
|
|
|
|
+
|
|
|
|
+ Note that rcu_dereference() and rcu_assign_pointer() relate to
|
|
|
|
+ SRCU just as they do to other forms of RCU.
|
|
|
|
|
|
15. The whole point of call_rcu(), synchronize_rcu(), and friends
|
|
15. The whole point of call_rcu(), synchronize_rcu(), and friends
|
|
is to wait until all pre-existing readers have finished before
|
|
is to wait until all pre-existing readers have finished before
|
|
@@ -435,3 +454,33 @@ over a rather long period of time, but improvements are always welcome!
|
|
|
|
|
|
These debugging aids can help you find problems that are
|
|
These debugging aids can help you find problems that are
|
|
otherwise extremely difficult to spot.
|
|
otherwise extremely difficult to spot.
|
|
|
|
+
|
|
|
|
+18. If you register a callback using call_rcu(), call_rcu_bh(),
|
|
|
|
+ call_rcu_sched(), or call_srcu(), and pass in a function defined
|
|
|
|
+ within a loadable module, then it in necessary to wait for
|
|
|
|
+ all pending callbacks to be invoked after the last invocation
|
|
|
|
+ and before unloading that module. Note that it is absolutely
|
|
|
|
+ -not- sufficient to wait for a grace period! The current (say)
|
|
|
|
+ synchronize_rcu() implementation waits only for all previous
|
|
|
|
+ callbacks registered on the CPU that synchronize_rcu() is running
|
|
|
|
+ on, but it is -not- guaranteed to wait for callbacks registered
|
|
|
|
+ on other CPUs.
|
|
|
|
+
|
|
|
|
+ You instead need to use one of the barrier functions:
|
|
|
|
+
|
|
|
|
+ o call_rcu() -> rcu_barrier()
|
|
|
|
+ o call_rcu_bh() -> rcu_barrier_bh()
|
|
|
|
+ o call_rcu_sched() -> rcu_barrier_sched()
|
|
|
|
+ o call_srcu() -> srcu_barrier()
|
|
|
|
+
|
|
|
|
+ However, these barrier functions are absolutely -not- guaranteed
|
|
|
|
+ to wait for a grace period. In fact, if there are no call_rcu()
|
|
|
|
+ callbacks waiting anywhere in the system, rcu_barrier() is within
|
|
|
|
+ its rights to return immediately.
|
|
|
|
+
|
|
|
|
+ So if you need to wait for both an RCU grace period and for
|
|
|
|
+ all pre-existing call_rcu() callbacks, you will need to execute
|
|
|
|
+ both rcu_barrier() and synchronize_rcu(), if necessary, using
|
|
|
|
+ something like workqueues to to execute them concurrently.
|
|
|
|
+
|
|
|
|
+ See rcubarrier.txt for more information.
|