|
@@ -1733,15 +1733,15 @@ The Linux kernel has eight basic CPU memory barriers:
|
|
|
|
|
|
|
|
|
All memory barriers except the data dependency barriers imply a compiler
|
|
|
-barrier. Data dependencies do not impose any additional compiler ordering.
|
|
|
+barrier. Data dependencies do not impose any additional compiler ordering.
|
|
|
|
|
|
Aside: In the case of data dependencies, the compiler would be expected
|
|
|
to issue the loads in the correct order (eg. `a[b]` would have to load
|
|
|
the value of b before loading a[b]), however there is no guarantee in
|
|
|
the C specification that the compiler may not speculate the value of b
|
|
|
(eg. is equal to 1) and load a before b (eg. tmp = a[1]; if (b != 1)
|
|
|
-tmp = a[b]; ). There is also the problem of a compiler reloading b after
|
|
|
-having loaded a[b], thus having a newer copy of b than a[b]. A consensus
|
|
|
+tmp = a[b]; ). There is also the problem of a compiler reloading b after
|
|
|
+having loaded a[b], thus having a newer copy of b than a[b]. A consensus
|
|
|
has not yet been reached about these problems, however the READ_ONCE()
|
|
|
macro is a good place to start looking.
|
|
|
|
|
@@ -1796,6 +1796,7 @@ There are some more advanced barrier functions:
|
|
|
|
|
|
|
|
|
(*) lockless_dereference();
|
|
|
+
|
|
|
This can be thought of as a pointer-fetch wrapper around the
|
|
|
smp_read_barrier_depends() data-dependency barrier.
|
|
|
|
|
@@ -1897,7 +1898,7 @@ for each construct. These operations all imply certain barriers:
|
|
|
Memory operations issued before the ACQUIRE may be completed after
|
|
|
the ACQUIRE operation has completed. An smp_mb__before_spinlock(),
|
|
|
combined with a following ACQUIRE, orders prior stores against
|
|
|
- subsequent loads and stores. Note that this is weaker than smp_mb()!
|
|
|
+ subsequent loads and stores. Note that this is weaker than smp_mb()!
|
|
|
The smp_mb__before_spinlock() primitive is free on many architectures.
|
|
|
|
|
|
(2) RELEASE operation implication:
|
|
@@ -2092,9 +2093,9 @@ or:
|
|
|
event_indicated = 1;
|
|
|
wake_up_process(event_daemon);
|
|
|
|
|
|
-A write memory barrier is implied by wake_up() and co. if and only if they wake
|
|
|
-something up. The barrier occurs before the task state is cleared, and so sits
|
|
|
-between the STORE to indicate the event and the STORE to set TASK_RUNNING:
|
|
|
+A write memory barrier is implied by wake_up() and co. if and only if they
|
|
|
+wake something up. The barrier occurs before the task state is cleared, and so
|
|
|
+sits between the STORE to indicate the event and the STORE to set TASK_RUNNING:
|
|
|
|
|
|
CPU 1 CPU 2
|
|
|
=============================== ===============================
|
|
@@ -2208,7 +2209,7 @@ three CPUs; then should the following sequence of events occur:
|
|
|
|
|
|
Then there is no guarantee as to what order CPU 3 will see the accesses to *A
|
|
|
through *H occur in, other than the constraints imposed by the separate locks
|
|
|
-on the separate CPUs. It might, for example, see:
|
|
|
+on the separate CPUs. It might, for example, see:
|
|
|
|
|
|
*E, ACQUIRE M, ACQUIRE Q, *G, *C, *F, *A, *B, RELEASE Q, *D, *H, RELEASE M
|
|
|
|
|
@@ -2488,9 +2489,9 @@ The following operations are special locking primitives:
|
|
|
clear_bit_unlock();
|
|
|
__clear_bit_unlock();
|
|
|
|
|
|
-These implement ACQUIRE-class and RELEASE-class operations. These should be used in
|
|
|
-preference to other operations when implementing locking primitives, because
|
|
|
-their implementations can be optimised on many architectures.
|
|
|
+These implement ACQUIRE-class and RELEASE-class operations. These should be
|
|
|
+used in preference to other operations when implementing locking primitives,
|
|
|
+because their implementations can be optimised on many architectures.
|
|
|
|
|
|
[!] Note that special memory barrier primitives are available for these
|
|
|
situations because on some CPUs the atomic instructions used imply full memory
|
|
@@ -2570,12 +2571,12 @@ explicit barriers are used.
|
|
|
|
|
|
Normally this won't be a problem because the I/O accesses done inside such
|
|
|
sections will include synchronous load operations on strictly ordered I/O
|
|
|
-registers that form implicit I/O barriers. If this isn't sufficient then an
|
|
|
+registers that form implicit I/O barriers. If this isn't sufficient then an
|
|
|
mmiowb() may need to be used explicitly.
|
|
|
|
|
|
|
|
|
A similar situation may occur between an interrupt routine and two routines
|
|
|
-running on separate CPUs that communicate with each other. If such a case is
|
|
|
+running on separate CPUs that communicate with each other. If such a case is
|
|
|
likely, then interrupt-disabling locks should be used to guarantee ordering.
|
|
|
|
|
|
|
|
@@ -2589,8 +2590,8 @@ functions:
|
|
|
(*) inX(), outX():
|
|
|
|
|
|
These are intended to talk to I/O space rather than memory space, but
|
|
|
- that's primarily a CPU-specific concept. The i386 and x86_64 processors do
|
|
|
- indeed have special I/O space access cycles and instructions, but many
|
|
|
+ that's primarily a CPU-specific concept. The i386 and x86_64 processors
|
|
|
+ do indeed have special I/O space access cycles and instructions, but many
|
|
|
CPUs don't have such a concept.
|
|
|
|
|
|
The PCI bus, amongst others, defines an I/O space concept which - on such
|
|
@@ -2612,7 +2613,7 @@ functions:
|
|
|
|
|
|
Whether these are guaranteed to be fully ordered and uncombined with
|
|
|
respect to each other on the issuing CPU depends on the characteristics
|
|
|
- defined for the memory window through which they're accessing. On later
|
|
|
+ defined for the memory window through which they're accessing. On later
|
|
|
i386 architecture machines, for example, this is controlled by way of the
|
|
|
MTRR registers.
|
|
|
|
|
@@ -2637,10 +2638,10 @@ functions:
|
|
|
(*) readX_relaxed(), writeX_relaxed()
|
|
|
|
|
|
These are similar to readX() and writeX(), but provide weaker memory
|
|
|
- ordering guarantees. Specifically, they do not guarantee ordering with
|
|
|
+ ordering guarantees. Specifically, they do not guarantee ordering with
|
|
|
respect to normal memory accesses (e.g. DMA buffers) nor do they guarantee
|
|
|
- ordering with respect to LOCK or UNLOCK operations. If the latter is
|
|
|
- required, an mmiowb() barrier can be used. Note that relaxed accesses to
|
|
|
+ ordering with respect to LOCK or UNLOCK operations. If the latter is
|
|
|
+ required, an mmiowb() barrier can be used. Note that relaxed accesses to
|
|
|
the same peripheral are guaranteed to be ordered with respect to each
|
|
|
other.
|
|
|
|
|
@@ -3042,6 +3043,7 @@ The Alpha defines the Linux kernel's memory barrier model.
|
|
|
|
|
|
See the subsection on "Cache Coherency" above.
|
|
|
|
|
|
+
|
|
|
VIRTUAL MACHINE GUESTS
|
|
|
----------------------
|
|
|
|
|
@@ -3052,7 +3054,7 @@ barriers for this use-case would be possible but is often suboptimal.
|
|
|
|
|
|
To handle this case optimally, low-level virt_mb() etc macros are available.
|
|
|
These have the same effect as smp_mb() etc when SMP is enabled, but generate
|
|
|
-identical code for SMP and non-SMP systems. For example, virtual machine guests
|
|
|
+identical code for SMP and non-SMP systems. For example, virtual machine guests
|
|
|
should use virt_mb() rather than smp_mb() when synchronizing against a
|
|
|
(possibly SMP) host.
|
|
|
|
|
@@ -3060,6 +3062,7 @@ These are equivalent to smp_mb() etc counterparts in all other respects,
|
|
|
in particular, they do not control MMIO effects: to control
|
|
|
MMIO effects, use mandatory barriers.
|
|
|
|
|
|
+
|
|
|
============
|
|
|
EXAMPLE USES
|
|
|
============
|