%!s(int64=9) %!d(string=hai) anos · 0b6fa347dc
--- a/Documentation/memory-barriers.txt
+++ b/Documentation/memory-barriers.txt
@@ -1733,15 +1733,15 @@ The Linux kernel has eight basic CPU memory barriers:
 
				 
			
 
				 
			
 
				 All memory barriers except the data dependency barriers imply a compiler
			
 
				-barrier. Data dependencies do not impose any additional compiler ordering.
			
 
				+barrier.  Data dependencies do not impose any additional compiler ordering.
			
 
				 
			
 
				 Aside: In the case of data dependencies, the compiler would be expected
			
 
				 to issue the loads in the correct order (eg. `a[b]` would have to load
			
 
				 the value of b before loading a[b]), however there is no guarantee in
			
 
				 the C specification that the compiler may not speculate the value of b
			
 
				 (eg. is equal to 1) and load a before b (eg. tmp = a[1]; if (b != 1)
			
 
				-tmp = a[b]; ). There is also the problem of a compiler reloading b after
			
 
				-having loaded a[b], thus having a newer copy of b than a[b]. A consensus
			
 
				+tmp = a[b]; ).  There is also the problem of a compiler reloading b after
			
 
				+having loaded a[b], thus having a newer copy of b than a[b].  A consensus
			
 
				 has not yet been reached about these problems, however the READ_ONCE()
			
 
				 macro is a good place to start looking.
			
 
				 
			
@@ -1796,6 +1796,7 @@ There are some more advanced barrier functions:
 
				 
			
 
				 
			
 
				  (*) lockless_dereference();
			
 
				+
			
 
				      This can be thought of as a pointer-fetch wrapper around the
			
 
				      smp_read_barrier_depends() data-dependency barrier.
			
 
				 
			
@@ -1897,7 +1898,7 @@ for each construct.  These operations all imply certain barriers:
 
				      Memory operations issued before the ACQUIRE may be completed after
			
 
				      the ACQUIRE operation has completed.  An smp_mb__before_spinlock(),
			
 
				      combined with a following ACQUIRE, orders prior stores against
			
 
				-     subsequent loads and stores. Note that this is weaker than smp_mb()!
			
 
				+     subsequent loads and stores.  Note that this is weaker than smp_mb()!
			
 
				      The smp_mb__before_spinlock() primitive is free on many architectures.
			
 
				 
			
 
				  (2) RELEASE operation implication:
			
@@ -2092,9 +2093,9 @@ or:
 
				 	event_indicated = 1;
			
 
				 	wake_up_process(event_daemon);
			
 
				 
			
 
				-A write memory barrier is implied by wake_up() and co. if and only if they wake
			
 
				-something up.  The barrier occurs before the task state is cleared, and so sits
			
 
				-between the STORE to indicate the event and the STORE to set TASK_RUNNING:
			
 
				+A write memory barrier is implied by wake_up() and co.  if and only if they
			
 
				+wake something up.  The barrier occurs before the task state is cleared, and so
			
 
				+sits between the STORE to indicate the event and the STORE to set TASK_RUNNING:
			
 
				 
			
 
				 	CPU 1				CPU 2
			
 
				 	===============================	===============================
			
@@ -2208,7 +2209,7 @@ three CPUs; then should the following sequence of events occur:
 
				 
			
 
				 Then there is no guarantee as to what order CPU 3 will see the accesses to *A
			
 
				 through *H occur in, other than the constraints imposed by the separate locks
			
 
				-on the separate CPUs. It might, for example, see:
			
 
				+on the separate CPUs.  It might, for example, see:
			
 
				 
			
 
				 	*E, ACQUIRE M, ACQUIRE Q, *G, *C, *F, *A, *B, RELEASE Q, *D, *H, RELEASE M
			
 
				 
			
@@ -2488,9 +2489,9 @@ The following operations are special locking primitives:
 
				 	clear_bit_unlock();
			
 
				 	__clear_bit_unlock();
			
 
				 
			
 
				-These implement ACQUIRE-class and RELEASE-class operations. These should be used in
			
 
				-preference to other operations when implementing locking primitives, because
			
 
				-their implementations can be optimised on many architectures.
			
 
				+These implement ACQUIRE-class and RELEASE-class operations.  These should be
			
 
				+used in preference to other operations when implementing locking primitives,
			
 
				+because their implementations can be optimised on many architectures.
			
 
				 
			
 
				 [!] Note that special memory barrier primitives are available for these
			
 
				 situations because on some CPUs the atomic instructions used imply full memory
			
@@ -2570,12 +2571,12 @@ explicit barriers are used.
 
				 
			
 
				 Normally this won't be a problem because the I/O accesses done inside such
			
 
				 sections will include synchronous load operations on strictly ordered I/O
			
 
				-registers that form implicit I/O barriers. If this isn't sufficient then an
			
 
				+registers that form implicit I/O barriers.  If this isn't sufficient then an
			
 
				 mmiowb() may need to be used explicitly.
			
 
				 
			
 
				 
			
 
				 A similar situation may occur between an interrupt routine and two routines
			
 
				-running on separate CPUs that communicate with each other. If such a case is
			
 
				+running on separate CPUs that communicate with each other.  If such a case is
			
 
				 likely, then interrupt-disabling locks should be used to guarantee ordering.
			
 
				 
			
 
				 
			
@@ -2589,8 +2590,8 @@ functions:
 
				  (*) inX(), outX():
			
 
				 
			
 
				      These are intended to talk to I/O space rather than memory space, but
			
 
				-     that's primarily a CPU-specific concept. The i386 and x86_64 processors do
			
 
				-     indeed have special I/O space access cycles and instructions, but many
			
 
				+     that's primarily a CPU-specific concept.  The i386 and x86_64 processors
			
 
				+     do indeed have special I/O space access cycles and instructions, but many
			
 
				      CPUs don't have such a concept.
			
 
				 
			
 
				      The PCI bus, amongst others, defines an I/O space concept which - on such
			
@@ -2612,7 +2613,7 @@ functions:
 
				 
			
 
				      Whether these are guaranteed to be fully ordered and uncombined with
			
 
				      respect to each other on the issuing CPU depends on the characteristics
			
 
				-     defined for the memory window through which they're accessing. On later
			
 
				+     defined for the memory window through which they're accessing.  On later
			
 
				      i386 architecture machines, for example, this is controlled by way of the
			
 
				      MTRR registers.
			
 
				 
			
@@ -2637,10 +2638,10 @@ functions:
 
				  (*) readX_relaxed(), writeX_relaxed()
			
 
				 
			
 
				      These are similar to readX() and writeX(), but provide weaker memory
			
 
				-     ordering guarantees. Specifically, they do not guarantee ordering with
			
 
				+     ordering guarantees.  Specifically, they do not guarantee ordering with
			
 
				      respect to normal memory accesses (e.g. DMA buffers) nor do they guarantee
			
 
				-     ordering with respect to LOCK or UNLOCK operations. If the latter is
			
 
				-     required, an mmiowb() barrier can be used. Note that relaxed accesses to
			
 
				+     ordering with respect to LOCK or UNLOCK operations.  If the latter is
			
 
				+     required, an mmiowb() barrier can be used.  Note that relaxed accesses to
			
 
				      the same peripheral are guaranteed to be ordered with respect to each
			
 
				      other.
			
 
				 
			
@@ -3042,6 +3043,7 @@ The Alpha defines the Linux kernel's memory barrier model.
 
				 
			
 
				 See the subsection on "Cache Coherency" above.
			
 
				 
			
 
				+
			
 
				 VIRTUAL MACHINE GUESTS
			
 
				 ----------------------
			
 
				 
			
@@ -3052,7 +3054,7 @@ barriers for this use-case would be possible but is often suboptimal.
 
				 
			
 
				 To handle this case optimally, low-level virt_mb() etc macros are available.
			
 
				 These have the same effect as smp_mb() etc when SMP is enabled, but generate
			
 
				-identical code for SMP and non-SMP systems. For example, virtual machine guests
			
 
				+identical code for SMP and non-SMP systems.  For example, virtual machine guests
			
 
				 should use virt_mb() rather than smp_mb() when synchronizing against a
			
 
				 (possibly SMP) host.
			
 
				 
			
@@ -3060,6 +3062,7 @@ These are equivalent to smp_mb() etc counterparts in all other respects,
 
				 in particular, they do not control MMIO effects: to control
			
 
				 MMIO effects, use mandatory barriers.
			
 
				 
			
 
				+
			
 
				 ============
			
 
				 EXAMPLE USES
			
 
				 ============