inodes.rst 17 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575
  1. .. SPDX-License-Identifier: GPL-2.0
  2. Index Nodes
  3. -----------
  4. In a regular UNIX filesystem, the inode stores all the metadata
  5. pertaining to the file (time stamps, block maps, extended attributes,
  6. etc), not the directory entry. To find the information associated with a
  7. file, one must traverse the directory files to find the directory entry
  8. associated with a file, then load the inode to find the metadata for
  9. that file. ext4 appears to cheat (for performance reasons) a little bit
  10. by storing a copy of the file type (normally stored in the inode) in the
  11. directory entry. (Compare all this to FAT, which stores all the file
  12. information directly in the directory entry, but does not support hard
  13. links and is in general more seek-happy than ext4 due to its simpler
  14. block allocator and extensive use of linked lists.)
  15. The inode table is a linear array of ``struct ext4_inode``. The table is
  16. sized to have enough blocks to store at least
  17. ``sb.s_inode_size * sb.s_inodes_per_group`` bytes. The number of the
  18. block group containing an inode can be calculated as
  19. ``(inode_number - 1) / sb.s_inodes_per_group``, and the offset into the
  20. group's table is ``(inode_number - 1) % sb.s_inodes_per_group``. There
  21. is no inode 0.
  22. The inode checksum is calculated against the FS UUID, the inode number,
  23. and the inode structure itself.
  24. The inode table entry is laid out in ``struct ext4_inode``.
  25. .. list-table::
  26. :widths: 1 1 1 77
  27. :header-rows: 1
  28. * - Offset
  29. - Size
  30. - Name
  31. - Description
  32. * - 0x0
  33. - \_\_le16
  34. - i\_mode
  35. - File mode. See the table i_mode_ below.
  36. * - 0x2
  37. - \_\_le16
  38. - i\_uid
  39. - Lower 16-bits of Owner UID.
  40. * - 0x4
  41. - \_\_le32
  42. - i\_size\_lo
  43. - Lower 32-bits of size in bytes.
  44. * - 0x8
  45. - \_\_le32
  46. - i\_atime
  47. - Last access time, in seconds since the epoch. However, if the EA\_INODE
  48. inode flag is set, this inode stores an extended attribute value and
  49. this field contains the checksum of the value.
  50. * - 0xC
  51. - \_\_le32
  52. - i\_ctime
  53. - Last inode change time, in seconds since the epoch. However, if the
  54. EA\_INODE inode flag is set, this inode stores an extended attribute
  55. value and this field contains the lower 32 bits of the attribute value's
  56. reference count.
  57. * - 0x10
  58. - \_\_le32
  59. - i\_mtime
  60. - Last data modification time, in seconds since the epoch. However, if the
  61. EA\_INODE inode flag is set, this inode stores an extended attribute
  62. value and this field contains the number of the inode that owns the
  63. extended attribute.
  64. * - 0x14
  65. - \_\_le32
  66. - i\_dtime
  67. - Deletion Time, in seconds since the epoch.
  68. * - 0x18
  69. - \_\_le16
  70. - i\_gid
  71. - Lower 16-bits of GID.
  72. * - 0x1A
  73. - \_\_le16
  74. - i\_links\_count
  75. - Hard link count. Normally, ext4 does not permit an inode to have more
  76. than 65,000 hard links. This applies to files as well as directories,
  77. which means that there cannot be more than 64,998 subdirectories in a
  78. directory (each subdirectory's '..' entry counts as a hard link, as does
  79. the '.' entry in the directory itself). With the DIR\_NLINK feature
  80. enabled, ext4 supports more than 64,998 subdirectories by setting this
  81. field to 1 to indicate that the number of hard links is not known.
  82. * - 0x1C
  83. - \_\_le32
  84. - i\_blocks\_lo
  85. - Lower 32-bits of “block” count. If the huge\_file feature flag is not
  86. set on the filesystem, the file consumes ``i_blocks_lo`` 512-byte blocks
  87. on disk. If huge\_file is set and EXT4\_HUGE\_FILE\_FL is NOT set in
  88. ``inode.i_flags``, then the file consumes ``i_blocks_lo + (i_blocks_hi
  89. << 32)`` 512-byte blocks on disk. If huge\_file is set and
  90. EXT4\_HUGE\_FILE\_FL IS set in ``inode.i_flags``, then this file
  91. consumes (``i_blocks_lo + i_blocks_hi`` << 32) filesystem blocks on
  92. disk.
  93. * - 0x20
  94. - \_\_le32
  95. - i\_flags
  96. - Inode flags. See the table i_flags_ below.
  97. * - 0x24
  98. - 4 bytes
  99. - i\_osd1
  100. - See the table i_osd1_ for more details.
  101. * - 0x28
  102. - 60 bytes
  103. - i\_block[EXT4\_N\_BLOCKS=15]
  104. - Block map or extent tree. See the section “The Contents of inode.i\_block”.
  105. * - 0x64
  106. - \_\_le32
  107. - i\_generation
  108. - File version (for NFS).
  109. * - 0x68
  110. - \_\_le32
  111. - i\_file\_acl\_lo
  112. - Lower 32-bits of extended attribute block. ACLs are of course one of
  113. many possible extended attributes; I think the name of this field is a
  114. result of the first use of extended attributes being for ACLs.
  115. * - 0x6C
  116. - \_\_le32
  117. - i\_size\_high / i\_dir\_acl
  118. - Upper 32-bits of file/directory size. In ext2/3 this field was named
  119. i\_dir\_acl, though it was usually set to zero and never used.
  120. * - 0x70
  121. - \_\_le32
  122. - i\_obso\_faddr
  123. - (Obsolete) fragment address.
  124. * - 0x74
  125. - 12 bytes
  126. - i\_osd2
  127. - See the table i_osd2_ for more details.
  128. * - 0x80
  129. - \_\_le16
  130. - i\_extra\_isize
  131. - Size of this inode - 128. Alternately, the size of the extended inode
  132. fields beyond the original ext2 inode, including this field.
  133. * - 0x82
  134. - \_\_le16
  135. - i\_checksum\_hi
  136. - Upper 16-bits of the inode checksum.
  137. * - 0x84
  138. - \_\_le32
  139. - i\_ctime\_extra
  140. - Extra change time bits. This provides sub-second precision. See Inode
  141. Timestamps section.
  142. * - 0x88
  143. - \_\_le32
  144. - i\_mtime\_extra
  145. - Extra modification time bits. This provides sub-second precision.
  146. * - 0x8C
  147. - \_\_le32
  148. - i\_atime\_extra
  149. - Extra access time bits. This provides sub-second precision.
  150. * - 0x90
  151. - \_\_le32
  152. - i\_crtime
  153. - File creation time, in seconds since the epoch.
  154. * - 0x94
  155. - \_\_le32
  156. - i\_crtime\_extra
  157. - Extra file creation time bits. This provides sub-second precision.
  158. * - 0x98
  159. - \_\_le32
  160. - i\_version\_hi
  161. - Upper 32-bits for version number.
  162. * - 0x9C
  163. - \_\_le32
  164. - i\_projid
  165. - Project ID.
  166. .. _i_mode:
  167. The ``i_mode`` value is a combination of the following flags:
  168. .. list-table::
  169. :widths: 1 79
  170. :header-rows: 1
  171. * - Value
  172. - Description
  173. * - 0x1
  174. - S\_IXOTH (Others may execute)
  175. * - 0x2
  176. - S\_IWOTH (Others may write)
  177. * - 0x4
  178. - S\_IROTH (Others may read)
  179. * - 0x8
  180. - S\_IXGRP (Group members may execute)
  181. * - 0x10
  182. - S\_IWGRP (Group members may write)
  183. * - 0x20
  184. - S\_IRGRP (Group members may read)
  185. * - 0x40
  186. - S\_IXUSR (Owner may execute)
  187. * - 0x80
  188. - S\_IWUSR (Owner may write)
  189. * - 0x100
  190. - S\_IRUSR (Owner may read)
  191. * - 0x200
  192. - S\_ISVTX (Sticky bit)
  193. * - 0x400
  194. - S\_ISGID (Set GID)
  195. * - 0x800
  196. - S\_ISUID (Set UID)
  197. * -
  198. - These are mutually-exclusive file types:
  199. * - 0x1000
  200. - S\_IFIFO (FIFO)
  201. * - 0x2000
  202. - S\_IFCHR (Character device)
  203. * - 0x4000
  204. - S\_IFDIR (Directory)
  205. * - 0x6000
  206. - S\_IFBLK (Block device)
  207. * - 0x8000
  208. - S\_IFREG (Regular file)
  209. * - 0xA000
  210. - S\_IFLNK (Symbolic link)
  211. * - 0xC000
  212. - S\_IFSOCK (Socket)
  213. .. _i_flags:
  214. The ``i_flags`` field is a combination of these values:
  215. .. list-table::
  216. :widths: 1 79
  217. :header-rows: 1
  218. * - Value
  219. - Description
  220. * - 0x1
  221. - This file requires secure deletion (EXT4\_SECRM\_FL). (not implemented)
  222. * - 0x2
  223. - This file should be preserved, should undeletion be desired
  224. (EXT4\_UNRM\_FL). (not implemented)
  225. * - 0x4
  226. - File is compressed (EXT4\_COMPR\_FL). (not really implemented)
  227. * - 0x8
  228. - All writes to the file must be synchronous (EXT4\_SYNC\_FL).
  229. * - 0x10
  230. - File is immutable (EXT4\_IMMUTABLE\_FL).
  231. * - 0x20
  232. - File can only be appended (EXT4\_APPEND\_FL).
  233. * - 0x40
  234. - The dump(1) utility should not dump this file (EXT4\_NODUMP\_FL).
  235. * - 0x80
  236. - Do not update access time (EXT4\_NOATIME\_FL).
  237. * - 0x100
  238. - Dirty compressed file (EXT4\_DIRTY\_FL). (not used)
  239. * - 0x200
  240. - File has one or more compressed clusters (EXT4\_COMPRBLK\_FL). (not used)
  241. * - 0x400
  242. - Do not compress file (EXT4\_NOCOMPR\_FL). (not used)
  243. * - 0x800
  244. - Encrypted inode (EXT4\_ENCRYPT\_FL). This bit value previously was
  245. EXT4\_ECOMPR\_FL (compression error), which was never used.
  246. * - 0x1000
  247. - Directory has hashed indexes (EXT4\_INDEX\_FL).
  248. * - 0x2000
  249. - AFS magic directory (EXT4\_IMAGIC\_FL).
  250. * - 0x4000
  251. - File data must always be written through the journal
  252. (EXT4\_JOURNAL\_DATA\_FL).
  253. * - 0x8000
  254. - File tail should not be merged (EXT4\_NOTAIL\_FL). (not used by ext4)
  255. * - 0x10000
  256. - All directory entry data should be written synchronously (see
  257. ``dirsync``) (EXT4\_DIRSYNC\_FL).
  258. * - 0x20000
  259. - Top of directory hierarchy (EXT4\_TOPDIR\_FL).
  260. * - 0x40000
  261. - This is a huge file (EXT4\_HUGE\_FILE\_FL).
  262. * - 0x80000
  263. - Inode uses extents (EXT4\_EXTENTS\_FL).
  264. * - 0x200000
  265. - Inode stores a large extended attribute value in its data blocks
  266. (EXT4\_EA\_INODE\_FL).
  267. * - 0x400000
  268. - This file has blocks allocated past EOF (EXT4\_EOFBLOCKS\_FL).
  269. (deprecated)
  270. * - 0x01000000
  271. - Inode is a snapshot (``EXT4_SNAPFILE_FL``). (not in mainline)
  272. * - 0x04000000
  273. - Snapshot is being deleted (``EXT4_SNAPFILE_DELETED_FL``). (not in
  274. mainline)
  275. * - 0x08000000
  276. - Snapshot shrink has completed (``EXT4_SNAPFILE_SHRUNK_FL``). (not in
  277. mainline)
  278. * - 0x10000000
  279. - Inode has inline data (EXT4\_INLINE\_DATA\_FL).
  280. * - 0x20000000
  281. - Create children with the same project ID (EXT4\_PROJINHERIT\_FL).
  282. * - 0x80000000
  283. - Reserved for ext4 library (EXT4\_RESERVED\_FL).
  284. * -
  285. - Aggregate flags:
  286. * - 0x4BDFFF
  287. - User-visible flags.
  288. * - 0x4B80FF
  289. - User-modifiable flags. Note that while EXT4\_JOURNAL\_DATA\_FL and
  290. EXT4\_EXTENTS\_FL can be set with setattr, they are not in the kernel's
  291. EXT4\_FL\_USER\_MODIFIABLE mask, since it needs to handle the setting of
  292. these flags in a special manner and they are masked out of the set of
  293. flags that are saved directly to i\_flags.
  294. .. _i_osd1:
  295. The ``osd1`` field has multiple meanings depending on the creator:
  296. Linux:
  297. .. list-table::
  298. :widths: 1 1 1 77
  299. :header-rows: 1
  300. * - Offset
  301. - Size
  302. - Name
  303. - Description
  304. * - 0x0
  305. - \_\_le32
  306. - l\_i\_version
  307. - Inode version. However, if the EA\_INODE inode flag is set, this inode
  308. stores an extended attribute value and this field contains the upper 32
  309. bits of the attribute value's reference count.
  310. Hurd:
  311. .. list-table::
  312. :widths: 1 1 1 77
  313. :header-rows: 1
  314. * - Offset
  315. - Size
  316. - Name
  317. - Description
  318. * - 0x0
  319. - \_\_le32
  320. - h\_i\_translator
  321. - ??
  322. Masix:
  323. .. list-table::
  324. :widths: 1 1 1 77
  325. :header-rows: 1
  326. * - Offset
  327. - Size
  328. - Name
  329. - Description
  330. * - 0x0
  331. - \_\_le32
  332. - m\_i\_reserved
  333. - ??
  334. .. _i_osd2:
  335. The ``osd2`` field has multiple meanings depending on the filesystem creator:
  336. Linux:
  337. .. list-table::
  338. :widths: 1 1 1 77
  339. :header-rows: 1
  340. * - Offset
  341. - Size
  342. - Name
  343. - Description
  344. * - 0x0
  345. - \_\_le16
  346. - l\_i\_blocks\_high
  347. - Upper 16-bits of the block count. Please see the note attached to
  348. i\_blocks\_lo.
  349. * - 0x2
  350. - \_\_le16
  351. - l\_i\_file\_acl\_high
  352. - Upper 16-bits of the extended attribute block (historically, the file
  353. ACL location). See the Extended Attributes section below.
  354. * - 0x4
  355. - \_\_le16
  356. - l\_i\_uid\_high
  357. - Upper 16-bits of the Owner UID.
  358. * - 0x6
  359. - \_\_le16
  360. - l\_i\_gid\_high
  361. - Upper 16-bits of the GID.
  362. * - 0x8
  363. - \_\_le16
  364. - l\_i\_checksum\_lo
  365. - Lower 16-bits of the inode checksum.
  366. * - 0xA
  367. - \_\_le16
  368. - l\_i\_reserved
  369. - Unused.
  370. Hurd:
  371. .. list-table::
  372. :widths: 1 1 1 77
  373. :header-rows: 1
  374. * - Offset
  375. - Size
  376. - Name
  377. - Description
  378. * - 0x0
  379. - \_\_le16
  380. - h\_i\_reserved1
  381. - ??
  382. * - 0x2
  383. - \_\_u16
  384. - h\_i\_mode\_high
  385. - Upper 16-bits of the file mode.
  386. * - 0x4
  387. - \_\_le16
  388. - h\_i\_uid\_high
  389. - Upper 16-bits of the Owner UID.
  390. * - 0x6
  391. - \_\_le16
  392. - h\_i\_gid\_high
  393. - Upper 16-bits of the GID.
  394. * - 0x8
  395. - \_\_u32
  396. - h\_i\_author
  397. - Author code?
  398. Masix:
  399. .. list-table::
  400. :widths: 1 1 1 77
  401. :header-rows: 1
  402. * - Offset
  403. - Size
  404. - Name
  405. - Description
  406. * - 0x0
  407. - \_\_le16
  408. - h\_i\_reserved1
  409. - ??
  410. * - 0x2
  411. - \_\_u16
  412. - m\_i\_file\_acl\_high
  413. - Upper 16-bits of the extended attribute block (historically, the file
  414. ACL location).
  415. * - 0x4
  416. - \_\_u32
  417. - m\_i\_reserved2[2]
  418. - ??
  419. Inode Size
  420. ~~~~~~~~~~
  421. In ext2 and ext3, the inode structure size was fixed at 128 bytes
  422. (``EXT2_GOOD_OLD_INODE_SIZE``) and each inode had a disk record size of
  423. 128 bytes. Starting with ext4, it is possible to allocate a larger
  424. on-disk inode at format time for all inodes in the filesystem to provide
  425. space beyond the end of the original ext2 inode. The on-disk inode
  426. record size is recorded in the superblock as ``s_inode_size``. The
  427. number of bytes actually used by struct ext4\_inode beyond the original
  428. 128-byte ext2 inode is recorded in the ``i_extra_isize`` field for each
  429. inode, which allows struct ext4\_inode to grow for a new kernel without
  430. having to upgrade all of the on-disk inodes. Access to fields beyond
  431. EXT2\_GOOD\_OLD\_INODE\_SIZE should be verified to be within
  432. ``i_extra_isize``. By default, ext4 inode records are 256 bytes, and (as
  433. of October 2013) the inode structure is 156 bytes
  434. (``i_extra_isize = 28``). The extra space between the end of the inode
  435. structure and the end of the inode record can be used to store extended
  436. attributes. Each inode record can be as large as the filesystem block
  437. size, though this is not terribly efficient.
  438. Finding an Inode
  439. ~~~~~~~~~~~~~~~~
  440. Each block group contains ``sb->s_inodes_per_group`` inodes. Because
  441. inode 0 is defined not to exist, this formula can be used to find the
  442. block group that an inode lives in:
  443. ``bg = (inode_num - 1) / sb->s_inodes_per_group``. The particular inode
  444. can be found within the block group's inode table at
  445. ``index = (inode_num - 1) % sb->s_inodes_per_group``. To get the byte
  446. address within the inode table, use
  447. ``offset = index * sb->s_inode_size``.
  448. Inode Timestamps
  449. ~~~~~~~~~~~~~~~~
  450. Four timestamps are recorded in the lower 128 bytes of the inode
  451. structure -- inode change time (ctime), access time (atime), data
  452. modification time (mtime), and deletion time (dtime). The four fields
  453. are 32-bit signed integers that represent seconds since the Unix epoch
  454. (1970-01-01 00:00:00 GMT), which means that the fields will overflow in
  455. January 2038. For inodes that are not linked from any directory but are
  456. still open (orphan inodes), the dtime field is overloaded for use with
  457. the orphan list. The superblock field ``s_last_orphan`` points to the
  458. first inode in the orphan list; dtime is then the number of the next
  459. orphaned inode, or zero if there are no more orphans.
  460. If the inode structure size ``sb->s_inode_size`` is larger than 128
  461. bytes and the ``i_inode_extra`` field is large enough to encompass the
  462. respective ``i_[cma]time_extra`` field, the ctime, atime, and mtime
  463. inode fields are widened to 64 bits. Within this “extra” 32-bit field,
  464. the lower two bits are used to extend the 32-bit seconds field to be 34
  465. bit wide; the upper 30 bits are used to provide nanosecond timestamp
  466. accuracy. Therefore, timestamps should not overflow until May 2446.
  467. dtime was not widened. There is also a fifth timestamp to record inode
  468. creation time (crtime); this field is 64-bits wide and decoded in the
  469. same manner as 64-bit [cma]time. Neither crtime nor dtime are accessible
  470. through the regular stat() interface, though debugfs will report them.
  471. We use the 32-bit signed time value plus (2^32 \* (extra epoch bits)).
  472. In other words:
  473. .. list-table::
  474. :widths: 20 20 20 20 20
  475. :header-rows: 1
  476. * - Extra epoch bits
  477. - MSB of 32-bit time
  478. - Adjustment for signed 32-bit to 64-bit tv\_sec
  479. - Decoded 64-bit tv\_sec
  480. - valid time range
  481. * - 0 0
  482. - 1
  483. - 0
  484. - ``-0x80000000 - -0x00000001``
  485. - 1901-12-13 to 1969-12-31
  486. * - 0 0
  487. - 0
  488. - 0
  489. - ``0x000000000 - 0x07fffffff``
  490. - 1970-01-01 to 2038-01-19
  491. * - 0 1
  492. - 1
  493. - 0x100000000
  494. - ``0x080000000 - 0x0ffffffff``
  495. - 2038-01-19 to 2106-02-07
  496. * - 0 1
  497. - 0
  498. - 0x100000000
  499. - ``0x100000000 - 0x17fffffff``
  500. - 2106-02-07 to 2174-02-25
  501. * - 1 0
  502. - 1
  503. - 0x200000000
  504. - ``0x180000000 - 0x1ffffffff``
  505. - 2174-02-25 to 2242-03-16
  506. * - 1 0
  507. - 0
  508. - 0x200000000
  509. - ``0x200000000 - 0x27fffffff``
  510. - 2242-03-16 to 2310-04-04
  511. * - 1 1
  512. - 1
  513. - 0x300000000
  514. - ``0x280000000 - 0x2ffffffff``
  515. - 2310-04-04 to 2378-04-22
  516. * - 1 1
  517. - 0
  518. - 0x300000000
  519. - ``0x300000000 - 0x37fffffff``
  520. - 2378-04-22 to 2446-05-10
  521. This is a somewhat odd encoding since there are effectively seven times
  522. as many positive values as negative values. There have also been
  523. long-standing bugs decoding and encoding dates beyond 2038, which don't
  524. seem to be fixed as of kernel 3.12 and e2fsprogs 1.42.8. 64-bit kernels
  525. incorrectly use the extra epoch bits 1,1 for dates between 1901 and
  526. 1970. At some point the kernel will be fixed and e2fsck will fix this
  527. situation, assuming that it is run before 2310.