xarray.rst 18 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410
  1. .. SPDX-License-Identifier: GPL-2.0+
  2. ======
  3. XArray
  4. ======
  5. :Author: Matthew Wilcox
  6. Overview
  7. ========
  8. The XArray is an abstract data type which behaves like a very large array
  9. of pointers. It meets many of the same needs as a hash or a conventional
  10. resizable array. Unlike a hash, it allows you to sensibly go to the
  11. next or previous entry in a cache-efficient manner. In contrast to a
  12. resizable array, there is no need to copy data or change MMU mappings in
  13. order to grow the array. It is more memory-efficient, parallelisable
  14. and cache friendly than a doubly-linked list. It takes advantage of
  15. RCU to perform lookups without locking.
  16. The XArray implementation is efficient when the indices used are densely
  17. clustered; hashing the object and using the hash as the index will not
  18. perform well. The XArray is optimised for small indices, but still has
  19. good performance with large indices. If your index can be larger than
  20. ``ULONG_MAX`` then the XArray is not the data type for you. The most
  21. important user of the XArray is the page cache.
  22. Each non-``NULL`` entry in the array has three bits associated with
  23. it called marks. Each mark may be set or cleared independently of
  24. the others. You can iterate over entries which are marked.
  25. Normal pointers may be stored in the XArray directly. They must be 4-byte
  26. aligned, which is true for any pointer returned from :c:func:`kmalloc` and
  27. :c:func:`alloc_page`. It isn't true for arbitrary user-space pointers,
  28. nor for function pointers. You can store pointers to statically allocated
  29. objects, as long as those objects have an alignment of at least 4.
  30. You can also store integers between 0 and ``LONG_MAX`` in the XArray.
  31. You must first convert it into an entry using :c:func:`xa_mk_value`.
  32. When you retrieve an entry from the XArray, you can check whether it is
  33. a value entry by calling :c:func:`xa_is_value`, and convert it back to
  34. an integer by calling :c:func:`xa_to_value`.
  35. Some users want to store tagged pointers instead of using the marks
  36. described above. They can call :c:func:`xa_tag_pointer` to create an
  37. entry with a tag, :c:func:`xa_untag_pointer` to turn a tagged entry
  38. back into an untagged pointer and :c:func:`xa_pointer_tag` to retrieve
  39. the tag of an entry. Tagged pointers use the same bits that are used
  40. to distinguish value entries from normal pointers, so each user must
  41. decide whether they want to store value entries or tagged pointers in
  42. any particular XArray.
  43. The XArray does not support storing :c:func:`IS_ERR` pointers as some
  44. conflict with value entries or internal entries.
  45. An unusual feature of the XArray is the ability to create entries which
  46. occupy a range of indices. Once stored to, looking up any index in
  47. the range will return the same entry as looking up any other index in
  48. the range. Setting a mark on one index will set it on all of them.
  49. Storing to any index will store to all of them. Multi-index entries can
  50. be explicitly split into smaller entries, or storing ``NULL`` into any
  51. entry will cause the XArray to forget about the range.
  52. Normal API
  53. ==========
  54. Start by initialising an XArray, either with :c:func:`DEFINE_XARRAY`
  55. for statically allocated XArrays or :c:func:`xa_init` for dynamically
  56. allocated ones. A freshly-initialised XArray contains a ``NULL``
  57. pointer at every index.
  58. You can then set entries using :c:func:`xa_store` and get entries
  59. using :c:func:`xa_load`. xa_store will overwrite any entry with the
  60. new entry and return the previous entry stored at that index. You can
  61. use :c:func:`xa_erase` instead of calling :c:func:`xa_store` with a
  62. ``NULL`` entry. There is no difference between an entry that has never
  63. been stored to and one that has most recently had ``NULL`` stored to it.
  64. You can conditionally replace an entry at an index by using
  65. :c:func:`xa_cmpxchg`. Like :c:func:`cmpxchg`, it will only succeed if
  66. the entry at that index has the 'old' value. It also returns the entry
  67. which was at that index; if it returns the same entry which was passed as
  68. 'old', then :c:func:`xa_cmpxchg` succeeded.
  69. If you want to only store a new entry to an index if the current entry
  70. at that index is ``NULL``, you can use :c:func:`xa_insert` which
  71. returns ``-EEXIST`` if the entry is not empty.
  72. You can enquire whether a mark is set on an entry by using
  73. :c:func:`xa_get_mark`. If the entry is not ``NULL``, you can set a mark
  74. on it by using :c:func:`xa_set_mark` and remove the mark from an entry by
  75. calling :c:func:`xa_clear_mark`. You can ask whether any entry in the
  76. XArray has a particular mark set by calling :c:func:`xa_marked`.
  77. You can copy entries out of the XArray into a plain array by calling
  78. :c:func:`xa_extract`. Or you can iterate over the present entries in
  79. the XArray by calling :c:func:`xa_for_each`. You may prefer to use
  80. :c:func:`xa_find` or :c:func:`xa_find_after` to move to the next present
  81. entry in the XArray.
  82. Finally, you can remove all entries from an XArray by calling
  83. :c:func:`xa_destroy`. If the XArray entries are pointers, you may wish
  84. to free the entries first. You can do this by iterating over all present
  85. entries in the XArray using the :c:func:`xa_for_each` iterator.
  86. Memory allocation
  87. -----------------
  88. The :c:func:`xa_store`, :c:func:`xa_cmpxchg`, :c:func:`xa_reserve`
  89. and :c:func:`xa_insert` functions take a gfp_t parameter in case
  90. the XArray needs to allocate memory to store this entry.
  91. If the entry is being deleted, no memory allocation needs to be performed,
  92. and the GFP flags specified will be ignored.
  93. It is possible for no memory to be allocatable, particularly if you pass
  94. a restrictive set of GFP flags. In that case, the functions return a
  95. special value which can be turned into an errno using :c:func:`xa_err`.
  96. If you don't need to know exactly which error occurred, using
  97. :c:func:`xa_is_err` is slightly more efficient.
  98. Locking
  99. -------
  100. When using the Normal API, you do not have to worry about locking.
  101. The XArray uses RCU and an internal spinlock to synchronise access:
  102. No lock needed:
  103. * :c:func:`xa_empty`
  104. * :c:func:`xa_marked`
  105. Takes RCU read lock:
  106. * :c:func:`xa_load`
  107. * :c:func:`xa_for_each`
  108. * :c:func:`xa_find`
  109. * :c:func:`xa_find_after`
  110. * :c:func:`xa_extract`
  111. * :c:func:`xa_get_mark`
  112. Takes xa_lock internally:
  113. * :c:func:`xa_store`
  114. * :c:func:`xa_insert`
  115. * :c:func:`xa_erase`
  116. * :c:func:`xa_erase_bh`
  117. * :c:func:`xa_erase_irq`
  118. * :c:func:`xa_cmpxchg`
  119. * :c:func:`xa_destroy`
  120. * :c:func:`xa_set_mark`
  121. * :c:func:`xa_clear_mark`
  122. Assumes xa_lock held on entry:
  123. * :c:func:`__xa_store`
  124. * :c:func:`__xa_insert`
  125. * :c:func:`__xa_erase`
  126. * :c:func:`__xa_cmpxchg`
  127. * :c:func:`__xa_set_mark`
  128. * :c:func:`__xa_clear_mark`
  129. If you want to take advantage of the lock to protect the data structures
  130. that you are storing in the XArray, you can call :c:func:`xa_lock`
  131. before calling :c:func:`xa_load`, then take a reference count on the
  132. object you have found before calling :c:func:`xa_unlock`. This will
  133. prevent stores from removing the object from the array between looking
  134. up the object and incrementing the refcount. You can also use RCU to
  135. avoid dereferencing freed memory, but an explanation of that is beyond
  136. the scope of this document.
  137. The XArray does not disable interrupts or softirqs while modifying
  138. the array. It is safe to read the XArray from interrupt or softirq
  139. context as the RCU lock provides enough protection.
  140. If, for example, you want to store entries in the XArray in process
  141. context and then erase them in softirq context, you can do that this way::
  142. void foo_init(struct foo *foo)
  143. {
  144. xa_init_flags(&foo->array, XA_FLAGS_LOCK_BH);
  145. }
  146. int foo_store(struct foo *foo, unsigned long index, void *entry)
  147. {
  148. int err;
  149. xa_lock_bh(&foo->array);
  150. err = xa_err(__xa_store(&foo->array, index, entry, GFP_KERNEL));
  151. if (!err)
  152. foo->count++;
  153. xa_unlock_bh(&foo->array);
  154. return err;
  155. }
  156. /* foo_erase() is only called from softirq context */
  157. void foo_erase(struct foo *foo, unsigned long index)
  158. {
  159. xa_lock(&foo->array);
  160. __xa_erase(&foo->array, index);
  161. foo->count--;
  162. xa_unlock(&foo->array);
  163. }
  164. If you are going to modify the XArray from interrupt or softirq context,
  165. you need to initialise the array using :c:func:`xa_init_flags`, passing
  166. ``XA_FLAGS_LOCK_IRQ`` or ``XA_FLAGS_LOCK_BH``.
  167. The above example also shows a common pattern of wanting to extend the
  168. coverage of the xa_lock on the store side to protect some statistics
  169. associated with the array.
  170. Sharing the XArray with interrupt context is also possible, either
  171. using :c:func:`xa_lock_irqsave` in both the interrupt handler and process
  172. context, or :c:func:`xa_lock_irq` in process context and :c:func:`xa_lock`
  173. in the interrupt handler. Some of the more common patterns have helper
  174. functions such as :c:func:`xa_erase_bh` and :c:func:`xa_erase_irq`.
  175. Sometimes you need to protect access to the XArray with a mutex because
  176. that lock sits above another mutex in the locking hierarchy. That does
  177. not entitle you to use functions like :c:func:`__xa_erase` without taking
  178. the xa_lock; the xa_lock is used for lockdep validation and will be used
  179. for other purposes in the future.
  180. The :c:func:`__xa_set_mark` and :c:func:`__xa_clear_mark` functions are also
  181. available for situations where you look up an entry and want to atomically
  182. set or clear a mark. It may be more efficient to use the advanced API
  183. in this case, as it will save you from walking the tree twice.
  184. Advanced API
  185. ============
  186. The advanced API offers more flexibility and better performance at the
  187. cost of an interface which can be harder to use and has fewer safeguards.
  188. No locking is done for you by the advanced API, and you are required
  189. to use the xa_lock while modifying the array. You can choose whether
  190. to use the xa_lock or the RCU lock while doing read-only operations on
  191. the array. You can mix advanced and normal operations on the same array;
  192. indeed the normal API is implemented in terms of the advanced API. The
  193. advanced API is only available to modules with a GPL-compatible license.
  194. The advanced API is based around the xa_state. This is an opaque data
  195. structure which you declare on the stack using the :c:func:`XA_STATE`
  196. macro. This macro initialises the xa_state ready to start walking
  197. around the XArray. It is used as a cursor to maintain the position
  198. in the XArray and let you compose various operations together without
  199. having to restart from the top every time.
  200. The xa_state is also used to store errors. You can call
  201. :c:func:`xas_error` to retrieve the error. All operations check whether
  202. the xa_state is in an error state before proceeding, so there's no need
  203. for you to check for an error after each call; you can make multiple
  204. calls in succession and only check at a convenient point. The only
  205. errors currently generated by the XArray code itself are ``ENOMEM`` and
  206. ``EINVAL``, but it supports arbitrary errors in case you want to call
  207. :c:func:`xas_set_err` yourself.
  208. If the xa_state is holding an ``ENOMEM`` error, calling :c:func:`xas_nomem`
  209. will attempt to allocate more memory using the specified gfp flags and
  210. cache it in the xa_state for the next attempt. The idea is that you take
  211. the xa_lock, attempt the operation and drop the lock. The operation
  212. attempts to allocate memory while holding the lock, but it is more
  213. likely to fail. Once you have dropped the lock, :c:func:`xas_nomem`
  214. can try harder to allocate more memory. It will return ``true`` if it
  215. is worth retrying the operation (i.e. that there was a memory error *and*
  216. more memory was allocated). If it has previously allocated memory, and
  217. that memory wasn't used, and there is no error (or some error that isn't
  218. ``ENOMEM``), then it will free the memory previously allocated.
  219. Internal Entries
  220. ----------------
  221. The XArray reserves some entries for its own purposes. These are never
  222. exposed through the normal API, but when using the advanced API, it's
  223. possible to see them. Usually the best way to handle them is to pass them
  224. to :c:func:`xas_retry`, and retry the operation if it returns ``true``.
  225. .. flat-table::
  226. :widths: 1 1 6
  227. * - Name
  228. - Test
  229. - Usage
  230. * - Node
  231. - :c:func:`xa_is_node`
  232. - An XArray node. May be visible when using a multi-index xa_state.
  233. * - Sibling
  234. - :c:func:`xa_is_sibling`
  235. - A non-canonical entry for a multi-index entry. The value indicates
  236. which slot in this node has the canonical entry.
  237. * - Retry
  238. - :c:func:`xa_is_retry`
  239. - This entry is currently being modified by a thread which has the
  240. xa_lock. The node containing this entry may be freed at the end
  241. of this RCU period. You should restart the lookup from the head
  242. of the array.
  243. * - Zero
  244. - :c:func:`xa_is_zero`
  245. - Zero entries appear as ``NULL`` through the Normal API, but occupy
  246. an entry in the XArray which can be used to reserve the index for
  247. future use.
  248. Other internal entries may be added in the future. As far as possible, they
  249. will be handled by :c:func:`xas_retry`.
  250. Additional functionality
  251. ------------------------
  252. The :c:func:`xas_create_range` function allocates all the necessary memory
  253. to store every entry in a range. It will set ENOMEM in the xa_state if
  254. it cannot allocate memory.
  255. You can use :c:func:`xas_init_marks` to reset the marks on an entry
  256. to their default state. This is usually all marks clear, unless the
  257. XArray is marked with ``XA_FLAGS_TRACK_FREE``, in which case mark 0 is set
  258. and all other marks are clear. Replacing one entry with another using
  259. :c:func:`xas_store` will not reset the marks on that entry; if you want
  260. the marks reset, you should do that explicitly.
  261. The :c:func:`xas_load` will walk the xa_state as close to the entry
  262. as it can. If you know the xa_state has already been walked to the
  263. entry and need to check that the entry hasn't changed, you can use
  264. :c:func:`xas_reload` to save a function call.
  265. If you need to move to a different index in the XArray, call
  266. :c:func:`xas_set`. This resets the cursor to the top of the tree, which
  267. will generally make the next operation walk the cursor to the desired
  268. spot in the tree. If you want to move to the next or previous index,
  269. call :c:func:`xas_next` or :c:func:`xas_prev`. Setting the index does
  270. not walk the cursor around the array so does not require a lock to be
  271. held, while moving to the next or previous index does.
  272. You can search for the next present entry using :c:func:`xas_find`. This
  273. is the equivalent of both :c:func:`xa_find` and :c:func:`xa_find_after`;
  274. if the cursor has been walked to an entry, then it will find the next
  275. entry after the one currently referenced. If not, it will return the
  276. entry at the index of the xa_state. Using :c:func:`xas_next_entry` to
  277. move to the next present entry instead of :c:func:`xas_find` will save
  278. a function call in the majority of cases at the expense of emitting more
  279. inline code.
  280. The :c:func:`xas_find_marked` function is similar. If the xa_state has
  281. not been walked, it will return the entry at the index of the xa_state,
  282. if it is marked. Otherwise, it will return the first marked entry after
  283. the entry referenced by the xa_state. The :c:func:`xas_next_marked`
  284. function is the equivalent of :c:func:`xas_next_entry`.
  285. When iterating over a range of the XArray using :c:func:`xas_for_each`
  286. or :c:func:`xas_for_each_marked`, it may be necessary to temporarily stop
  287. the iteration. The :c:func:`xas_pause` function exists for this purpose.
  288. After you have done the necessary work and wish to resume, the xa_state
  289. is in an appropriate state to continue the iteration after the entry
  290. you last processed. If you have interrupts disabled while iterating,
  291. then it is good manners to pause the iteration and reenable interrupts
  292. every ``XA_CHECK_SCHED`` entries.
  293. The :c:func:`xas_get_mark`, :c:func:`xas_set_mark` and
  294. :c:func:`xas_clear_mark` functions require the xa_state cursor to have
  295. been moved to the appropriate location in the xarray; they will do
  296. nothing if you have called :c:func:`xas_pause` or :c:func:`xas_set`
  297. immediately before.
  298. You can call :c:func:`xas_set_update` to have a callback function
  299. called each time the XArray updates a node. This is used by the page
  300. cache workingset code to maintain its list of nodes which contain only
  301. shadow entries.
  302. Multi-Index Entries
  303. -------------------
  304. The XArray has the ability to tie multiple indices together so that
  305. operations on one index affect all indices. For example, storing into
  306. any index will change the value of the entry retrieved from any index.
  307. Setting or clearing a mark on any index will set or clear the mark
  308. on every index that is tied together. The current implementation
  309. only allows tying ranges which are aligned powers of two together;
  310. eg indices 64-127 may be tied together, but 2-6 may not be. This may
  311. save substantial quantities of memory; for example tying 512 entries
  312. together will save over 4kB.
  313. You can create a multi-index entry by using :c:func:`XA_STATE_ORDER`
  314. or :c:func:`xas_set_order` followed by a call to :c:func:`xas_store`.
  315. Calling :c:func:`xas_load` with a multi-index xa_state will walk the
  316. xa_state to the right location in the tree, but the return value is not
  317. meaningful, potentially being an internal entry or ``NULL`` even when there
  318. is an entry stored within the range. Calling :c:func:`xas_find_conflict`
  319. will return the first entry within the range or ``NULL`` if there are no
  320. entries in the range. The :c:func:`xas_for_each_conflict` iterator will
  321. iterate over every entry which overlaps the specified range.
  322. If :c:func:`xas_load` encounters a multi-index entry, the xa_index
  323. in the xa_state will not be changed. When iterating over an XArray
  324. or calling :c:func:`xas_find`, if the initial index is in the middle
  325. of a multi-index entry, it will not be altered. Subsequent calls
  326. or iterations will move the index to the first index in the range.
  327. Each entry will only be returned once, no matter how many indices it
  328. occupies.
  329. Using :c:func:`xas_next` or :c:func:`xas_prev` with a multi-index xa_state
  330. is not supported. Using either of these functions on a multi-index entry
  331. will reveal sibling entries; these should be skipped over by the caller.
  332. Storing ``NULL`` into any index of a multi-index entry will set the entry
  333. at every index to ``NULL`` and dissolve the tie. Splitting a multi-index
  334. entry into entries occupying smaller ranges is not yet supported.
  335. Functions and structures
  336. ========================
  337. .. kernel-doc:: include/linux/xarray.h
  338. .. kernel-doc:: lib/xarray.c