11 лет назад · 9161f54097
--- a/Documentation/mutex-design.txt
+++ b/Documentation/mutex-design.txt
@@ -1,139 +1,157 @@
 
				 Generic Mutex Subsystem
			
 
				 
			
 
				 started by Ingo Molnar <mingo@redhat.com>
			
 
				+updated by Davidlohr Bueso <davidlohr@hp.com>
			
 
				 
			
 
				-  "Why on earth do we need a new mutex subsystem, and what's wrong
			
 
				-   with semaphores?"
			
 
				+What are mutexes?
			
 
				+-----------------
			
 
				 
			
 
				-firstly, there's nothing wrong with semaphores. But if the simpler
			
 
				-mutex semantics are sufficient for your code, then there are a couple
			
 
				-of advantages of mutexes:
			
 
				+In the Linux kernel, mutexes refer to a particular locking primitive
			
 
				+that enforces serialization on shared memory systems, and not only to
			
 
				+the generic term referring to 'mutual exclusion' found in academia
			
 
				+or similar theoretical text books. Mutexes are sleeping locks which
			
 
				+behave similarly to binary semaphores, and were introduced in 2006[1]
			
 
				+as an alternative to these. This new data structure provided a number
			
 
				+of advantages, including simpler interfaces, and at that time smaller
			
 
				+code (see Disadvantages).
			
 
				 
			
 
				- - 'struct mutex' is smaller on most architectures: E.g. on x86,
			
 
				-   'struct semaphore' is 20 bytes, 'struct mutex' is 16 bytes.
			
 
				-   A smaller structure size means less RAM footprint, and better
			
 
				-   CPU-cache utilization.
			
 
				+[1] http://lwn.net/Articles/164802/
			
 
				 
			
 
				- - tighter code. On x86 i get the following .text sizes when
			
 
				-   switching all mutex-alike semaphores in the kernel to the mutex
			
 
				-   subsystem:
			
 
				+Implementation
			
 
				+--------------
			
 
				 
			
 
				-        text    data     bss     dec     hex filename
			
 
				-     3280380  868188  396860 4545428  455b94 vmlinux-semaphore
			
 
				-     3255329  865296  396732 4517357  44eded vmlinux-mutex
			
 
				+Mutexes are represented by 'struct mutex', defined in include/linux/mutex.h
			
 
				+and implemented in kernel/locking/mutex.c. These locks use a three
			
 
				+state atomic counter (->count) to represent the different possible
			
 
				+transitions that can occur during the lifetime of a lock:
			
 
				 
			
 
				-   that's 25051 bytes of code saved, or a 0.76% win - off the hottest
			
 
				-   codepaths of the kernel. (The .data savings are 2892 bytes, or 0.33%)
			
 
				-   Smaller code means better icache footprint, which is one of the
			
 
				-   major optimization goals in the Linux kernel currently.
			
 
				+	  1: unlocked
			
 
				+	  0: locked, no waiters
			
 
				+   negative: locked, with potential waiters
			
 
				 
			
 
				- - the mutex subsystem is slightly faster and has better scalability for
			
 
				-   contended workloads. On an 8-way x86 system, running a mutex-based
			
 
				-   kernel and testing creat+unlink+close (of separate, per-task files)
			
 
				-   in /tmp with 16 parallel tasks, the average number of ops/sec is:
			
 
				+In its most basic form it also includes a wait-queue and a spinlock
			
 
				+that serializes access to it. CONFIG_SMP systems can also include
			
 
				+a pointer to the lock task owner (->owner) as well as a spinner MCS
			
 
				+lock (->osq), both described below in (ii).
			
 
				 
			
 
				-    Semaphores:                        Mutexes:
			
 
				+When acquiring a mutex, there are three possible paths that can be
			
 
				+taken, depending on the state of the lock:
			
 
				 
			
 
				-    $ ./test-mutex V 16 10             $ ./test-mutex V 16 10
			
 
				-    8 CPUs, running 16 tasks.          8 CPUs, running 16 tasks.
			
 
				-    checking VFS performance.          checking VFS performance.
			
 
				-    avg loops/sec:      34713          avg loops/sec:      84153
			
 
				-    CPU utilization:    63%            CPU utilization:    22%
			
 
				+(i) fastpath: tries to atomically acquire the lock by decrementing the
			
 
				+    counter. If it was already taken by another task it goes to the next
			
 
				+    possible path. This logic is architecture specific. On x86-64, the
			
 
				+    locking fastpath is 2 instructions:
			
 
				 
			
 
				-   i.e. in this workload, the mutex based kernel was 2.4 times faster
			
 
				-   than the semaphore based kernel, _and_ it also had 2.8 times less CPU
			
 
				-   utilization. (In terms of 'ops per CPU cycle', the semaphore kernel
			
 
				-   performed 551 ops/sec per 1% of CPU time used, while the mutex kernel
			
 
				-   performed 3825 ops/sec per 1% of CPU time used - it was 6.9 times
			
 
				-   more efficient.)
			
 
				-
			
 
				-   the scalability difference is visible even on a 2-way P4 HT box:
			
 
				-
			
 
				-    Semaphores:                        Mutexes:
			
 
				-
			
 
				-    $ ./test-mutex V 16 10             $ ./test-mutex V 16 10
			
 
				-    4 CPUs, running 16 tasks.          8 CPUs, running 16 tasks.
			
 
				-    checking VFS performance.          checking VFS performance.
			
 
				-    avg loops/sec:      127659         avg loops/sec:      181082
			
 
				-    CPU utilization:    100%           CPU utilization:    34%
			
 
				-
			
 
				-   (the straight performance advantage of mutexes is 41%, the per-cycle
			
 
				-    efficiency of mutexes is 4.1 times better.)
			
 
				-
			
 
				- - there are no fastpath tradeoffs, the mutex fastpath is just as tight
			
 
				-   as the semaphore fastpath. On x86, the locking fastpath is 2
			
 
				-   instructions:
			
 
				-
			
 
				-    c0377ccb <mutex_lock>:
			
 
				-    c0377ccb:       f0 ff 08                lock decl (%eax)
			
 
				-    c0377cce:       78 0e                   js     c0377cde <.text..lock.mutex>
			
 
				-    c0377cd0:       c3                      ret
			
 
				+    0000000000000e10 <mutex_lock>:
			
 
				+    e21:   f0 ff 0b                lock decl (%rbx)
			
 
				+    e24:   79 08                   jns    e2e <mutex_lock+0x1e>
			
 
				 
			
 
				    the unlocking fastpath is equally tight:
			
 
				 
			
 
				-    c0377cd1 <mutex_unlock>:
			
 
				-    c0377cd1:       f0 ff 00                lock incl (%eax)
			
 
				-    c0377cd4:       7e 0f                   jle    c0377ce5 <.text..lock.mutex+0x7>
			
 
				-    c0377cd6:       c3                      ret
			
 
				-
			
 
				- - 'struct mutex' semantics are well-defined and are enforced if
			
 
				-   CONFIG_DEBUG_MUTEXES is turned on. Semaphores on the other hand have
			
 
				-   virtually no debugging code or instrumentation. The mutex subsystem
			
 
				-   checks and enforces the following rules:
			
 
				-
			
 
				-   * - only one task can hold the mutex at a time
			
 
				-   * - only the owner can unlock the mutex
			
 
				-   * - multiple unlocks are not permitted
			
 
				-   * - recursive locking is not permitted
			
 
				-   * - a mutex object must be initialized via the API
			
 
				-   * - a mutex object must not be initialized via memset or copying
			
 
				-   * - task may not exit with mutex held
			
 
				-   * - memory areas where held locks reside must not be freed
			
 
				-   * - held mutexes must not be reinitialized
			
 
				-   * - mutexes may not be used in hardware or software interrupt
			
 
				-   *   contexts such as tasklets and timers
			
 
				-
			
 
				-   furthermore, there are also convenience features in the debugging
			
 
				-   code:
			
 
				-
			
 
				-   * - uses symbolic names of mutexes, whenever they are printed in debug output
			
 
				-   * - point-of-acquire tracking, symbolic lookup of function names
			
 
				-   * - list of all locks held in the system, printout of them
			
 
				-   * - owner tracking
			
 
				-   * - detects self-recursing locks and prints out all relevant info
			
 
				-   * - detects multi-task circular deadlocks and prints out all affected
			
 
				-   *   locks and tasks (and only those tasks)
			
 
				+    0000000000000bc0 <mutex_unlock>:
			
 
				+    bc8:   f0 ff 07                lock incl (%rdi)
			
 
				+    bcb:   7f 0a                   jg     bd7 <mutex_unlock+0x17>
			
 
				+
			
 
				+
			
 
				+(ii) midpath: aka optimistic spinning, tries to spin for acquisition
			
 
				+     while the lock owner is running and there are no other tasks ready
			
 
				+     to run that have higher priority (need_resched). The rationale is
			
 
				+     that if the lock owner is running, it is likely to release the lock
			
 
				+     soon. The mutex spinners are queued up using MCS lock so that only
			
 
				+     one spinner can compete for the mutex.
			
 
				+
			
 
				+     The MCS lock (proposed by Mellor-Crummey and Scott) is a simple spinlock
			
 
				+     with the desirable properties of being fair and with each cpu trying
			
 
				+     to acquire the lock spinning on a local variable. It avoids expensive
			
 
				+     cacheline bouncing that common test-and-set spinlock implementations
			
 
				+     incur. An MCS-like lock is specially tailored for optimistic spinning
			
 
				+     for sleeping lock implementation. An important feature of the customized
			
 
				+     MCS lock is that it has the extra property that spinners are able to exit
			
 
				+     the MCS spinlock queue when they need to reschedule. This further helps
			
 
				+     avoid situations where MCS spinners that need to reschedule would continue
			
 
				+     waiting to spin on mutex owner, only to go directly to slowpath upon
			
 
				+     obtaining the MCS lock.
			
 
				+
			
 
				+
			
 
				+(iii) slowpath: last resort, if the lock is still unable to be acquired,
			
 
				+      the task is added to the wait-queue and sleeps until woken up by the
			
 
				+      unlock path. Under normal circumstances it blocks as TASK_UNINTERRUPTIBLE.
			
 
				+
			
 
				+While formally kernel mutexes are sleepable locks, it is path (ii) that
			
 
				+makes them more practically a hybrid type. By simply not interrupting a
			
 
				+task and busy-waiting for a few cycles instead of immediately sleeping,
			
 
				+the performance of this lock has been seen to significantly improve a
			
 
				+number of workloads. Note that this technique is also used for rw-semaphores.
			
 
				+
			
 
				+Semantics
			
 
				+---------
			
 
				+
			
 
				+The mutex subsystem checks and enforces the following rules:
			
 
				+
			
 
				+    - Only one task can hold the mutex at a time.
			
 
				+    - Only the owner can unlock the mutex.
			
 
				+    - Multiple unlocks are not permitted.
			
 
				+    - Recursive locking/unlocking is not permitted.
			
 
				+    - A mutex must only be initialized via the API (see below).
			
 
				+    - A task may not exit with a mutex held.
			
 
				+    - Memory areas where held locks reside must not be freed.
			
 
				+    - Held mutexes must not be reinitialized.
			
 
				+    - Mutexes may not be used in hardware or software interrupt
			
 
				+      contexts such as tasklets and timers.
			
 
				+
			
 
				+These semantics are fully enforced when CONFIG DEBUG_MUTEXES is enabled.
			
 
				+In addition, the mutex debugging code also implements a number of other
			
 
				+features that make lock debugging easier and faster:
			
 
				+
			
 
				+    - Uses symbolic names of mutexes, whenever they are printed
			
 
				+      in debug output.
			
 
				+    - Point-of-acquire tracking, symbolic lookup of function names,
			
 
				+      list of all locks held in the system, printout of them.
			
 
				+    - Owner tracking.
			
 
				+    - Detects self-recursing locks and prints out all relevant info.
			
 
				+    - Detects multi-task circular deadlocks and prints out all affected
			
 
				+      locks and tasks (and only those tasks).
			
 
				+
			
 
				+
			
 
				+Interfaces
			
 
				+----------
			
 
				+Statically define the mutex:
			
 
				+   DEFINE_MUTEX(name);
			
 
				+
			
 
				+Dynamically initialize the mutex:
			
 
				+   mutex_init(mutex);
			
 
				+
			
 
				+Acquire the mutex, uninterruptible:
			
 
				+   void mutex_lock(struct mutex *lock);
			
 
				+   void mutex_lock_nested(struct mutex *lock, unsigned int subclass);
			
 
				+   int  mutex_trylock(struct mutex *lock);
			
 
				+
			
 
				+Acquire the mutex, interruptible:
			
 
				+   int mutex_lock_interruptible_nested(struct mutex *lock,
			
 
				+				       unsigned int subclass);
			
 
				+   int mutex_lock_interruptible(struct mutex *lock);
			
 
				+
			
 
				+Acquire the mutex, interruptible, if dec to 0:
			
 
				+   int atomic_dec_and_mutex_lock(atomic_t *cnt, struct mutex *lock);
			
 
				+
			
 
				+Unlock the mutex:
			
 
				+   void mutex_unlock(struct mutex *lock);
			
 
				+
			
 
				+Test if the mutex is taken:
			
 
				+   int mutex_is_locked(struct mutex *lock);
			
 
				 
			
 
				 Disadvantages
			
 
				 -------------
			
 
				 
			
 
				-The stricter mutex API means you cannot use mutexes the same way you
			
 
				-can use semaphores: e.g. they cannot be used from an interrupt context,
			
 
				-nor can they be unlocked from a different context that which acquired
			
 
				-it. [ I'm not aware of any other (e.g. performance) disadvantages from
			
 
				-using mutexes at the moment, please let me know if you find any. ]
			
 
				-
			
 
				-Implementation of mutexes
			
 
				--------------------------
			
 
				-
			
 
				-'struct mutex' is the new mutex type, defined in include/linux/mutex.h and
			
 
				-implemented in kernel/locking/mutex.c. It is a counter-based mutex with a
			
 
				-spinlock and a wait-list. The counter has 3 states: 1 for "unlocked", 0 for
			
 
				-"locked" and negative numbers (usually -1) for "locked, potential waiters
			
 
				-queued".
			
 
				-
			
 
				-the APIs of 'struct mutex' have been streamlined:
			
 
				-
			
 
				- DEFINE_MUTEX(name);
			
 
				+Unlike its original design and purpose, 'struct mutex' is larger than
			
 
				+most locks in the kernel. E.g: on x86-64 it is 40 bytes, almost twice
			
 
				+as large as 'struct semaphore' (24 bytes) and 8 bytes shy of the
			
 
				+'struct rw_semaphore' variant. Larger structure sizes mean more CPU
			
 
				+cache and memory footprint.
			
 
				 
			
 
				- mutex_init(mutex);
			
 
				+When to use mutexes
			
 
				+-------------------
			
 
				 
			
 
				- void mutex_lock(struct mutex *lock);
			
 
				- int  mutex_lock_interruptible(struct mutex *lock);
			
 
				- int  mutex_trylock(struct mutex *lock);
			
 
				- void mutex_unlock(struct mutex *lock);
			
 
				- int  mutex_is_locked(struct mutex *lock);
			
 
				- void mutex_lock_nested(struct mutex *lock, unsigned int subclass);
			
 
				- int  mutex_lock_interruptible_nested(struct mutex *lock,
			
 
				-                                      unsigned int subclass);
			
 
				- int atomic_dec_and_mutex_lock(atomic_t *cnt, struct mutex *lock);
			
 
				+Unless the strict semantics of mutexes are unsuitable and/or the critical
			
 
				+region prevents the lock from being shared, always prefer them to any other
			
 
				+locking primitive.