directory.rst 12 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426
  1. .. SPDX-License-Identifier: GPL-2.0
  2. Directory Entries
  3. -----------------
  4. In an ext4 filesystem, a directory is more or less a flat file that maps
  5. an arbitrary byte string (usually ASCII) to an inode number on the
  6. filesystem. There can be many directory entries across the filesystem
  7. that reference the same inode number--these are known as hard links, and
  8. that is why hard links cannot reference files on other filesystems. As
  9. such, directory entries are found by reading the data block(s)
  10. associated with a directory file for the particular directory entry that
  11. is desired.
  12. Linear (Classic) Directories
  13. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  14. By default, each directory lists its entries in an “almost-linear”
  15. array. I write “almost” because it's not a linear array in the memory
  16. sense because directory entries are not split across filesystem blocks.
  17. Therefore, it is more accurate to say that a directory is a series of
  18. data blocks and that each block contains a linear array of directory
  19. entries. The end of each per-block array is signified by reaching the
  20. end of the block; the last entry in the block has a record length that
  21. takes it all the way to the end of the block. The end of the entire
  22. directory is of course signified by reaching the end of the file. Unused
  23. directory entries are signified by inode = 0. By default the filesystem
  24. uses ``struct ext4_dir_entry_2`` for directory entries unless the
  25. “filetype” feature flag is not set, in which case it uses
  26. ``struct ext4_dir_entry``.
  27. The original directory entry format is ``struct ext4_dir_entry``, which
  28. is at most 263 bytes long, though on disk you'll need to reference
  29. ``dirent.rec_len`` to know for sure.
  30. .. list-table::
  31. :widths: 1 1 1 77
  32. :header-rows: 1
  33. * - Offset
  34. - Size
  35. - Name
  36. - Description
  37. * - 0x0
  38. - \_\_le32
  39. - inode
  40. - Number of the inode that this directory entry points to.
  41. * - 0x4
  42. - \_\_le16
  43. - rec\_len
  44. - Length of this directory entry. Must be a multiple of 4.
  45. * - 0x6
  46. - \_\_le16
  47. - name\_len
  48. - Length of the file name.
  49. * - 0x8
  50. - char
  51. - name[EXT4\_NAME\_LEN]
  52. - File name.
  53. Since file names cannot be longer than 255 bytes, the new directory
  54. entry format shortens the rec\_len field and uses the space for a file
  55. type flag, probably to avoid having to load every inode during directory
  56. tree traversal. This format is ``ext4_dir_entry_2``, which is at most
  57. 263 bytes long, though on disk you'll need to reference
  58. ``dirent.rec_len`` to know for sure.
  59. .. list-table::
  60. :widths: 1 1 1 77
  61. :header-rows: 1
  62. * - Offset
  63. - Size
  64. - Name
  65. - Description
  66. * - 0x0
  67. - \_\_le32
  68. - inode
  69. - Number of the inode that this directory entry points to.
  70. * - 0x4
  71. - \_\_le16
  72. - rec\_len
  73. - Length of this directory entry.
  74. * - 0x6
  75. - \_\_u8
  76. - name\_len
  77. - Length of the file name.
  78. * - 0x7
  79. - \_\_u8
  80. - file\_type
  81. - File type code, see ftype_ table below.
  82. * - 0x8
  83. - char
  84. - name[EXT4\_NAME\_LEN]
  85. - File name.
  86. .. _ftype:
  87. The directory file type is one of the following values:
  88. .. list-table::
  89. :widths: 1 79
  90. :header-rows: 1
  91. * - Value
  92. - Description
  93. * - 0x0
  94. - Unknown.
  95. * - 0x1
  96. - Regular file.
  97. * - 0x2
  98. - Directory.
  99. * - 0x3
  100. - Character device file.
  101. * - 0x4
  102. - Block device file.
  103. * - 0x5
  104. - FIFO.
  105. * - 0x6
  106. - Socket.
  107. * - 0x7
  108. - Symbolic link.
  109. In order to add checksums to these classic directory blocks, a phony
  110. ``struct ext4_dir_entry`` is placed at the end of each leaf block to
  111. hold the checksum. The directory entry is 12 bytes long. The inode
  112. number and name\_len fields are set to zero to fool old software into
  113. ignoring an apparently empty directory entry, and the checksum is stored
  114. in the place where the name normally goes. The structure is
  115. ``struct ext4_dir_entry_tail``:
  116. .. list-table::
  117. :widths: 1 1 1 77
  118. :header-rows: 1
  119. * - Offset
  120. - Size
  121. - Name
  122. - Description
  123. * - 0x0
  124. - \_\_le32
  125. - det\_reserved\_zero1
  126. - Inode number, which must be zero.
  127. * - 0x4
  128. - \_\_le16
  129. - det\_rec\_len
  130. - Length of this directory entry, which must be 12.
  131. * - 0x6
  132. - \_\_u8
  133. - det\_reserved\_zero2
  134. - Length of the file name, which must be zero.
  135. * - 0x7
  136. - \_\_u8
  137. - det\_reserved\_ft
  138. - File type, which must be 0xDE.
  139. * - 0x8
  140. - \_\_le32
  141. - det\_checksum
  142. - Directory leaf block checksum.
  143. The leaf directory block checksum is calculated against the FS UUID, the
  144. directory's inode number, the directory's inode generation number, and
  145. the entire directory entry block up to (but not including) the fake
  146. directory entry.
  147. Hash Tree Directories
  148. ~~~~~~~~~~~~~~~~~~~~~
  149. A linear array of directory entries isn't great for performance, so a
  150. new feature was added to ext3 to provide a faster (but peculiar)
  151. balanced tree keyed off a hash of the directory entry name. If the
  152. EXT4\_INDEX\_FL (0x1000) flag is set in the inode, this directory uses a
  153. hashed btree (htree) to organize and find directory entries. For
  154. backwards read-only compatibility with ext2, this tree is actually
  155. hidden inside the directory file, masquerading as “empty” directory data
  156. blocks! It was stated previously that the end of the linear directory
  157. entry table was signified with an entry pointing to inode 0; this is
  158. (ab)used to fool the old linear-scan algorithm into thinking that the
  159. rest of the directory block is empty so that it moves on.
  160. The root of the tree always lives in the first data block of the
  161. directory. By ext2 custom, the '.' and '..' entries must appear at the
  162. beginning of this first block, so they are put here as two
  163. ``struct ext4_dir_entry_2``\ s and not stored in the tree. The rest of
  164. the root node contains metadata about the tree and finally a hash->block
  165. map to find nodes that are lower in the htree. If
  166. ``dx_root.info.indirect_levels`` is non-zero then the htree has two
  167. levels; the data block pointed to by the root node's map is an interior
  168. node, which is indexed by a minor hash. Interior nodes in this tree
  169. contains a zeroed out ``struct ext4_dir_entry_2`` followed by a
  170. minor\_hash->block map to find leafe nodes. Leaf nodes contain a linear
  171. array of all ``struct ext4_dir_entry_2``; all of these entries
  172. (presumably) hash to the same value. If there is an overflow, the
  173. entries simply overflow into the next leaf node, and the
  174. least-significant bit of the hash (in the interior node map) that gets
  175. us to this next leaf node is set.
  176. To traverse the directory as a htree, the code calculates the hash of
  177. the desired file name and uses it to find the corresponding block
  178. number. If the tree is flat, the block is a linear array of directory
  179. entries that can be searched; otherwise, the minor hash of the file name
  180. is computed and used against this second block to find the corresponding
  181. third block number. That third block number will be a linear array of
  182. directory entries.
  183. To traverse the directory as a linear array (such as the old code does),
  184. the code simply reads every data block in the directory. The blocks used
  185. for the htree will appear to have no entries (aside from '.' and '..')
  186. and so only the leaf nodes will appear to have any interesting content.
  187. The root of the htree is in ``struct dx_root``, which is the full length
  188. of a data block:
  189. .. list-table::
  190. :widths: 1 1 1 77
  191. :header-rows: 1
  192. * - Offset
  193. - Type
  194. - Name
  195. - Description
  196. * - 0x0
  197. - \_\_le32
  198. - dot.inode
  199. - inode number of this directory.
  200. * - 0x4
  201. - \_\_le16
  202. - dot.rec\_len
  203. - Length of this record, 12.
  204. * - 0x6
  205. - u8
  206. - dot.name\_len
  207. - Length of the name, 1.
  208. * - 0x7
  209. - u8
  210. - dot.file\_type
  211. - File type of this entry, 0x2 (directory) (if the feature flag is set).
  212. * - 0x8
  213. - char
  214. - dot.name[4]
  215. - “.\\0\\0\\0”
  216. * - 0xC
  217. - \_\_le32
  218. - dotdot.inode
  219. - inode number of parent directory.
  220. * - 0x10
  221. - \_\_le16
  222. - dotdot.rec\_len
  223. - block\_size - 12. The record length is long enough to cover all htree
  224. data.
  225. * - 0x12
  226. - u8
  227. - dotdot.name\_len
  228. - Length of the name, 2.
  229. * - 0x13
  230. - u8
  231. - dotdot.file\_type
  232. - File type of this entry, 0x2 (directory) (if the feature flag is set).
  233. * - 0x14
  234. - char
  235. - dotdot\_name[4]
  236. - “..\\0\\0”
  237. * - 0x18
  238. - \_\_le32
  239. - struct dx\_root\_info.reserved\_zero
  240. - Zero.
  241. * - 0x1C
  242. - u8
  243. - struct dx\_root\_info.hash\_version
  244. - Hash type, see dirhash_ table below.
  245. * - 0x1D
  246. - u8
  247. - struct dx\_root\_info.info\_length
  248. - Length of the tree information, 0x8.
  249. * - 0x1E
  250. - u8
  251. - struct dx\_root\_info.indirect\_levels
  252. - Depth of the htree. Cannot be larger than 3 if the INCOMPAT\_LARGEDIR
  253. feature is set; cannot be larger than 2 otherwise.
  254. * - 0x1F
  255. - u8
  256. - struct dx\_root\_info.unused\_flags
  257. -
  258. * - 0x20
  259. - \_\_le16
  260. - limit
  261. - Maximum number of dx\_entries that can follow this header, plus 1 for
  262. the header itself.
  263. * - 0x22
  264. - \_\_le16
  265. - count
  266. - Actual number of dx\_entries that follow this header, plus 1 for the
  267. header itself.
  268. * - 0x24
  269. - \_\_le32
  270. - block
  271. - The block number (within the directory file) that goes with hash=0.
  272. * - 0x28
  273. - struct dx\_entry
  274. - entries[0]
  275. - As many 8-byte ``struct dx_entry`` as fits in the rest of the data block.
  276. .. _dirhash:
  277. The directory hash is one of the following values:
  278. .. list-table::
  279. :widths: 1 79
  280. :header-rows: 1
  281. * - Value
  282. - Description
  283. * - 0x0
  284. - Legacy.
  285. * - 0x1
  286. - Half MD4.
  287. * - 0x2
  288. - Tea.
  289. * - 0x3
  290. - Legacy, unsigned.
  291. * - 0x4
  292. - Half MD4, unsigned.
  293. * - 0x5
  294. - Tea, unsigned.
  295. Interior nodes of an htree are recorded as ``struct dx_node``, which is
  296. also the full length of a data block:
  297. .. list-table::
  298. :widths: 1 1 1 77
  299. :header-rows: 1
  300. * - Offset
  301. - Type
  302. - Name
  303. - Description
  304. * - 0x0
  305. - \_\_le32
  306. - fake.inode
  307. - Zero, to make it look like this entry is not in use.
  308. * - 0x4
  309. - \_\_le16
  310. - fake.rec\_len
  311. - The size of the block, in order to hide all of the dx\_node data.
  312. * - 0x6
  313. - u8
  314. - name\_len
  315. - Zero. There is no name for this “unused” directory entry.
  316. * - 0x7
  317. - u8
  318. - file\_type
  319. - Zero. There is no file type for this “unused” directory entry.
  320. * - 0x8
  321. - \_\_le16
  322. - limit
  323. - Maximum number of dx\_entries that can follow this header, plus 1 for
  324. the header itself.
  325. * - 0xA
  326. - \_\_le16
  327. - count
  328. - Actual number of dx\_entries that follow this header, plus 1 for the
  329. header itself.
  330. * - 0xE
  331. - \_\_le32
  332. - block
  333. - The block number (within the directory file) that goes with the lowest
  334. hash value of this block. This value is stored in the parent block.
  335. * - 0x12
  336. - struct dx\_entry
  337. - entries[0]
  338. - As many 8-byte ``struct dx_entry`` as fits in the rest of the data block.
  339. The hash maps that exist in both ``struct dx_root`` and
  340. ``struct dx_node`` are recorded as ``struct dx_entry``, which is 8 bytes
  341. long:
  342. .. list-table::
  343. :widths: 1 1 1 77
  344. :header-rows: 1
  345. * - Offset
  346. - Type
  347. - Name
  348. - Description
  349. * - 0x0
  350. - \_\_le32
  351. - hash
  352. - Hash code.
  353. * - 0x4
  354. - \_\_le32
  355. - block
  356. - Block number (within the directory file, not filesystem blocks) of the
  357. next node in the htree.
  358. (If you think this is all quite clever and peculiar, so does the
  359. author.)
  360. If metadata checksums are enabled, the last 8 bytes of the directory
  361. block (precisely the length of one dx\_entry) are used to store a
  362. ``struct dx_tail``, which contains the checksum. The ``limit`` and
  363. ``count`` entries in the dx\_root/dx\_node structures are adjusted as
  364. necessary to fit the dx\_tail into the block. If there is no space for
  365. the dx\_tail, the user is notified to run e2fsck -D to rebuild the
  366. directory index (which will ensure that there's space for the checksum.
  367. The dx\_tail structure is 8 bytes long and looks like this:
  368. .. list-table::
  369. :widths: 1 1 1 77
  370. :header-rows: 1
  371. * - Offset
  372. - Type
  373. - Name
  374. - Description
  375. * - 0x0
  376. - u32
  377. - dt\_reserved
  378. - Zero.
  379. * - 0x4
  380. - \_\_le32
  381. - dt\_checksum
  382. - Checksum of the htree directory block.
  383. The checksum is calculated against the FS UUID, the htree index header
  384. (dx\_root or dx\_node), all of the htree indices (dx\_entry) that are in
  385. use, and the tail block (dx\_tail).