|
@@ -0,0 +1,426 @@
|
|
|
+.. SPDX-License-Identifier: GPL-2.0
|
|
|
+
|
|
|
+Directory Entries
|
|
|
+-----------------
|
|
|
+
|
|
|
+In an ext4 filesystem, a directory is more or less a flat file that maps
|
|
|
+an arbitrary byte string (usually ASCII) to an inode number on the
|
|
|
+filesystem. There can be many directory entries across the filesystem
|
|
|
+that reference the same inode number--these are known as hard links, and
|
|
|
+that is why hard links cannot reference files on other filesystems. As
|
|
|
+such, directory entries are found by reading the data block(s)
|
|
|
+associated with a directory file for the particular directory entry that
|
|
|
+is desired.
|
|
|
+
|
|
|
+Linear (Classic) Directories
|
|
|
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
+
|
|
|
+By default, each directory lists its entries in an “almost-linear”
|
|
|
+array. I write “almost” because it's not a linear array in the memory
|
|
|
+sense because directory entries are not split across filesystem blocks.
|
|
|
+Therefore, it is more accurate to say that a directory is a series of
|
|
|
+data blocks and that each block contains a linear array of directory
|
|
|
+entries. The end of each per-block array is signified by reaching the
|
|
|
+end of the block; the last entry in the block has a record length that
|
|
|
+takes it all the way to the end of the block. The end of the entire
|
|
|
+directory is of course signified by reaching the end of the file. Unused
|
|
|
+directory entries are signified by inode = 0. By default the filesystem
|
|
|
+uses ``struct ext4_dir_entry_2`` for directory entries unless the
|
|
|
+“filetype” feature flag is not set, in which case it uses
|
|
|
+``struct ext4_dir_entry``.
|
|
|
+
|
|
|
+The original directory entry format is ``struct ext4_dir_entry``, which
|
|
|
+is at most 263 bytes long, though on disk you'll need to reference
|
|
|
+``dirent.rec_len`` to know for sure.
|
|
|
+
|
|
|
+.. list-table::
|
|
|
+ :widths: 1 1 1 77
|
|
|
+ :header-rows: 1
|
|
|
+
|
|
|
+ * - Offset
|
|
|
+ - Size
|
|
|
+ - Name
|
|
|
+ - Description
|
|
|
+ * - 0x0
|
|
|
+ - \_\_le32
|
|
|
+ - inode
|
|
|
+ - Number of the inode that this directory entry points to.
|
|
|
+ * - 0x4
|
|
|
+ - \_\_le16
|
|
|
+ - rec\_len
|
|
|
+ - Length of this directory entry. Must be a multiple of 4.
|
|
|
+ * - 0x6
|
|
|
+ - \_\_le16
|
|
|
+ - name\_len
|
|
|
+ - Length of the file name.
|
|
|
+ * - 0x8
|
|
|
+ - char
|
|
|
+ - name[EXT4\_NAME\_LEN]
|
|
|
+ - File name.
|
|
|
+
|
|
|
+Since file names cannot be longer than 255 bytes, the new directory
|
|
|
+entry format shortens the rec\_len field and uses the space for a file
|
|
|
+type flag, probably to avoid having to load every inode during directory
|
|
|
+tree traversal. This format is ``ext4_dir_entry_2``, which is at most
|
|
|
+263 bytes long, though on disk you'll need to reference
|
|
|
+``dirent.rec_len`` to know for sure.
|
|
|
+
|
|
|
+.. list-table::
|
|
|
+ :widths: 1 1 1 77
|
|
|
+ :header-rows: 1
|
|
|
+
|
|
|
+ * - Offset
|
|
|
+ - Size
|
|
|
+ - Name
|
|
|
+ - Description
|
|
|
+ * - 0x0
|
|
|
+ - \_\_le32
|
|
|
+ - inode
|
|
|
+ - Number of the inode that this directory entry points to.
|
|
|
+ * - 0x4
|
|
|
+ - \_\_le16
|
|
|
+ - rec\_len
|
|
|
+ - Length of this directory entry.
|
|
|
+ * - 0x6
|
|
|
+ - \_\_u8
|
|
|
+ - name\_len
|
|
|
+ - Length of the file name.
|
|
|
+ * - 0x7
|
|
|
+ - \_\_u8
|
|
|
+ - file\_type
|
|
|
+ - File type code, see ftype_ table below.
|
|
|
+ * - 0x8
|
|
|
+ - char
|
|
|
+ - name[EXT4\_NAME\_LEN]
|
|
|
+ - File name.
|
|
|
+
|
|
|
+.. _ftype:
|
|
|
+
|
|
|
+The directory file type is one of the following values:
|
|
|
+
|
|
|
+.. list-table::
|
|
|
+ :widths: 1 79
|
|
|
+ :header-rows: 1
|
|
|
+
|
|
|
+ * - Value
|
|
|
+ - Description
|
|
|
+ * - 0x0
|
|
|
+ - Unknown.
|
|
|
+ * - 0x1
|
|
|
+ - Regular file.
|
|
|
+ * - 0x2
|
|
|
+ - Directory.
|
|
|
+ * - 0x3
|
|
|
+ - Character device file.
|
|
|
+ * - 0x4
|
|
|
+ - Block device file.
|
|
|
+ * - 0x5
|
|
|
+ - FIFO.
|
|
|
+ * - 0x6
|
|
|
+ - Socket.
|
|
|
+ * - 0x7
|
|
|
+ - Symbolic link.
|
|
|
+
|
|
|
+In order to add checksums to these classic directory blocks, a phony
|
|
|
+``struct ext4_dir_entry`` is placed at the end of each leaf block to
|
|
|
+hold the checksum. The directory entry is 12 bytes long. The inode
|
|
|
+number and name\_len fields are set to zero to fool old software into
|
|
|
+ignoring an apparently empty directory entry, and the checksum is stored
|
|
|
+in the place where the name normally goes. The structure is
|
|
|
+``struct ext4_dir_entry_tail``:
|
|
|
+
|
|
|
+.. list-table::
|
|
|
+ :widths: 1 1 1 77
|
|
|
+ :header-rows: 1
|
|
|
+
|
|
|
+ * - Offset
|
|
|
+ - Size
|
|
|
+ - Name
|
|
|
+ - Description
|
|
|
+ * - 0x0
|
|
|
+ - \_\_le32
|
|
|
+ - det\_reserved\_zero1
|
|
|
+ - Inode number, which must be zero.
|
|
|
+ * - 0x4
|
|
|
+ - \_\_le16
|
|
|
+ - det\_rec\_len
|
|
|
+ - Length of this directory entry, which must be 12.
|
|
|
+ * - 0x6
|
|
|
+ - \_\_u8
|
|
|
+ - det\_reserved\_zero2
|
|
|
+ - Length of the file name, which must be zero.
|
|
|
+ * - 0x7
|
|
|
+ - \_\_u8
|
|
|
+ - det\_reserved\_ft
|
|
|
+ - File type, which must be 0xDE.
|
|
|
+ * - 0x8
|
|
|
+ - \_\_le32
|
|
|
+ - det\_checksum
|
|
|
+ - Directory leaf block checksum.
|
|
|
+
|
|
|
+The leaf directory block checksum is calculated against the FS UUID, the
|
|
|
+directory's inode number, the directory's inode generation number, and
|
|
|
+the entire directory entry block up to (but not including) the fake
|
|
|
+directory entry.
|
|
|
+
|
|
|
+Hash Tree Directories
|
|
|
+~~~~~~~~~~~~~~~~~~~~~
|
|
|
+
|
|
|
+A linear array of directory entries isn't great for performance, so a
|
|
|
+new feature was added to ext3 to provide a faster (but peculiar)
|
|
|
+balanced tree keyed off a hash of the directory entry name. If the
|
|
|
+EXT4\_INDEX\_FL (0x1000) flag is set in the inode, this directory uses a
|
|
|
+hashed btree (htree) to organize and find directory entries. For
|
|
|
+backwards read-only compatibility with ext2, this tree is actually
|
|
|
+hidden inside the directory file, masquerading as “empty” directory data
|
|
|
+blocks! It was stated previously that the end of the linear directory
|
|
|
+entry table was signified with an entry pointing to inode 0; this is
|
|
|
+(ab)used to fool the old linear-scan algorithm into thinking that the
|
|
|
+rest of the directory block is empty so that it moves on.
|
|
|
+
|
|
|
+The root of the tree always lives in the first data block of the
|
|
|
+directory. By ext2 custom, the '.' and '..' entries must appear at the
|
|
|
+beginning of this first block, so they are put here as two
|
|
|
+``struct ext4_dir_entry_2``\ s and not stored in the tree. The rest of
|
|
|
+the root node contains metadata about the tree and finally a hash->block
|
|
|
+map to find nodes that are lower in the htree. If
|
|
|
+``dx_root.info.indirect_levels`` is non-zero then the htree has two
|
|
|
+levels; the data block pointed to by the root node's map is an interior
|
|
|
+node, which is indexed by a minor hash. Interior nodes in this tree
|
|
|
+contains a zeroed out ``struct ext4_dir_entry_2`` followed by a
|
|
|
+minor\_hash->block map to find leafe nodes. Leaf nodes contain a linear
|
|
|
+array of all ``struct ext4_dir_entry_2``; all of these entries
|
|
|
+(presumably) hash to the same value. If there is an overflow, the
|
|
|
+entries simply overflow into the next leaf node, and the
|
|
|
+least-significant bit of the hash (in the interior node map) that gets
|
|
|
+us to this next leaf node is set.
|
|
|
+
|
|
|
+To traverse the directory as a htree, the code calculates the hash of
|
|
|
+the desired file name and uses it to find the corresponding block
|
|
|
+number. If the tree is flat, the block is a linear array of directory
|
|
|
+entries that can be searched; otherwise, the minor hash of the file name
|
|
|
+is computed and used against this second block to find the corresponding
|
|
|
+third block number. That third block number will be a linear array of
|
|
|
+directory entries.
|
|
|
+
|
|
|
+To traverse the directory as a linear array (such as the old code does),
|
|
|
+the code simply reads every data block in the directory. The blocks used
|
|
|
+for the htree will appear to have no entries (aside from '.' and '..')
|
|
|
+and so only the leaf nodes will appear to have any interesting content.
|
|
|
+
|
|
|
+The root of the htree is in ``struct dx_root``, which is the full length
|
|
|
+of a data block:
|
|
|
+
|
|
|
+.. list-table::
|
|
|
+ :widths: 1 1 1 77
|
|
|
+ :header-rows: 1
|
|
|
+
|
|
|
+ * - Offset
|
|
|
+ - Type
|
|
|
+ - Name
|
|
|
+ - Description
|
|
|
+ * - 0x0
|
|
|
+ - \_\_le32
|
|
|
+ - dot.inode
|
|
|
+ - inode number of this directory.
|
|
|
+ * - 0x4
|
|
|
+ - \_\_le16
|
|
|
+ - dot.rec\_len
|
|
|
+ - Length of this record, 12.
|
|
|
+ * - 0x6
|
|
|
+ - u8
|
|
|
+ - dot.name\_len
|
|
|
+ - Length of the name, 1.
|
|
|
+ * - 0x7
|
|
|
+ - u8
|
|
|
+ - dot.file\_type
|
|
|
+ - File type of this entry, 0x2 (directory) (if the feature flag is set).
|
|
|
+ * - 0x8
|
|
|
+ - char
|
|
|
+ - dot.name[4]
|
|
|
+ - “.\\0\\0\\0”
|
|
|
+ * - 0xC
|
|
|
+ - \_\_le32
|
|
|
+ - dotdot.inode
|
|
|
+ - inode number of parent directory.
|
|
|
+ * - 0x10
|
|
|
+ - \_\_le16
|
|
|
+ - dotdot.rec\_len
|
|
|
+ - block\_size - 12. The record length is long enough to cover all htree
|
|
|
+ data.
|
|
|
+ * - 0x12
|
|
|
+ - u8
|
|
|
+ - dotdot.name\_len
|
|
|
+ - Length of the name, 2.
|
|
|
+ * - 0x13
|
|
|
+ - u8
|
|
|
+ - dotdot.file\_type
|
|
|
+ - File type of this entry, 0x2 (directory) (if the feature flag is set).
|
|
|
+ * - 0x14
|
|
|
+ - char
|
|
|
+ - dotdot\_name[4]
|
|
|
+ - “..\\0\\0”
|
|
|
+ * - 0x18
|
|
|
+ - \_\_le32
|
|
|
+ - struct dx\_root\_info.reserved\_zero
|
|
|
+ - Zero.
|
|
|
+ * - 0x1C
|
|
|
+ - u8
|
|
|
+ - struct dx\_root\_info.hash\_version
|
|
|
+ - Hash type, see dirhash_ table below.
|
|
|
+ * - 0x1D
|
|
|
+ - u8
|
|
|
+ - struct dx\_root\_info.info\_length
|
|
|
+ - Length of the tree information, 0x8.
|
|
|
+ * - 0x1E
|
|
|
+ - u8
|
|
|
+ - struct dx\_root\_info.indirect\_levels
|
|
|
+ - Depth of the htree. Cannot be larger than 3 if the INCOMPAT\_LARGEDIR
|
|
|
+ feature is set; cannot be larger than 2 otherwise.
|
|
|
+ * - 0x1F
|
|
|
+ - u8
|
|
|
+ - struct dx\_root\_info.unused\_flags
|
|
|
+ -
|
|
|
+ * - 0x20
|
|
|
+ - \_\_le16
|
|
|
+ - limit
|
|
|
+ - Maximum number of dx\_entries that can follow this header, plus 1 for
|
|
|
+ the header itself.
|
|
|
+ * - 0x22
|
|
|
+ - \_\_le16
|
|
|
+ - count
|
|
|
+ - Actual number of dx\_entries that follow this header, plus 1 for the
|
|
|
+ header itself.
|
|
|
+ * - 0x24
|
|
|
+ - \_\_le32
|
|
|
+ - block
|
|
|
+ - The block number (within the directory file) that goes with hash=0.
|
|
|
+ * - 0x28
|
|
|
+ - struct dx\_entry
|
|
|
+ - entries[0]
|
|
|
+ - As many 8-byte ``struct dx_entry`` as fits in the rest of the data block.
|
|
|
+
|
|
|
+.. _dirhash:
|
|
|
+
|
|
|
+The directory hash is one of the following values:
|
|
|
+
|
|
|
+.. list-table::
|
|
|
+ :widths: 1 79
|
|
|
+ :header-rows: 1
|
|
|
+
|
|
|
+ * - Value
|
|
|
+ - Description
|
|
|
+ * - 0x0
|
|
|
+ - Legacy.
|
|
|
+ * - 0x1
|
|
|
+ - Half MD4.
|
|
|
+ * - 0x2
|
|
|
+ - Tea.
|
|
|
+ * - 0x3
|
|
|
+ - Legacy, unsigned.
|
|
|
+ * - 0x4
|
|
|
+ - Half MD4, unsigned.
|
|
|
+ * - 0x5
|
|
|
+ - Tea, unsigned.
|
|
|
+
|
|
|
+Interior nodes of an htree are recorded as ``struct dx_node``, which is
|
|
|
+also the full length of a data block:
|
|
|
+
|
|
|
+.. list-table::
|
|
|
+ :widths: 1 1 1 77
|
|
|
+ :header-rows: 1
|
|
|
+
|
|
|
+ * - Offset
|
|
|
+ - Type
|
|
|
+ - Name
|
|
|
+ - Description
|
|
|
+ * - 0x0
|
|
|
+ - \_\_le32
|
|
|
+ - fake.inode
|
|
|
+ - Zero, to make it look like this entry is not in use.
|
|
|
+ * - 0x4
|
|
|
+ - \_\_le16
|
|
|
+ - fake.rec\_len
|
|
|
+ - The size of the block, in order to hide all of the dx\_node data.
|
|
|
+ * - 0x6
|
|
|
+ - u8
|
|
|
+ - name\_len
|
|
|
+ - Zero. There is no name for this “unused” directory entry.
|
|
|
+ * - 0x7
|
|
|
+ - u8
|
|
|
+ - file\_type
|
|
|
+ - Zero. There is no file type for this “unused” directory entry.
|
|
|
+ * - 0x8
|
|
|
+ - \_\_le16
|
|
|
+ - limit
|
|
|
+ - Maximum number of dx\_entries that can follow this header, plus 1 for
|
|
|
+ the header itself.
|
|
|
+ * - 0xA
|
|
|
+ - \_\_le16
|
|
|
+ - count
|
|
|
+ - Actual number of dx\_entries that follow this header, plus 1 for the
|
|
|
+ header itself.
|
|
|
+ * - 0xE
|
|
|
+ - \_\_le32
|
|
|
+ - block
|
|
|
+ - The block number (within the directory file) that goes with the lowest
|
|
|
+ hash value of this block. This value is stored in the parent block.
|
|
|
+ * - 0x12
|
|
|
+ - struct dx\_entry
|
|
|
+ - entries[0]
|
|
|
+ - As many 8-byte ``struct dx_entry`` as fits in the rest of the data block.
|
|
|
+
|
|
|
+The hash maps that exist in both ``struct dx_root`` and
|
|
|
+``struct dx_node`` are recorded as ``struct dx_entry``, which is 8 bytes
|
|
|
+long:
|
|
|
+
|
|
|
+.. list-table::
|
|
|
+ :widths: 1 1 1 77
|
|
|
+ :header-rows: 1
|
|
|
+
|
|
|
+ * - Offset
|
|
|
+ - Type
|
|
|
+ - Name
|
|
|
+ - Description
|
|
|
+ * - 0x0
|
|
|
+ - \_\_le32
|
|
|
+ - hash
|
|
|
+ - Hash code.
|
|
|
+ * - 0x4
|
|
|
+ - \_\_le32
|
|
|
+ - block
|
|
|
+ - Block number (within the directory file, not filesystem blocks) of the
|
|
|
+ next node in the htree.
|
|
|
+
|
|
|
+(If you think this is all quite clever and peculiar, so does the
|
|
|
+author.)
|
|
|
+
|
|
|
+If metadata checksums are enabled, the last 8 bytes of the directory
|
|
|
+block (precisely the length of one dx\_entry) are used to store a
|
|
|
+``struct dx_tail``, which contains the checksum. The ``limit`` and
|
|
|
+``count`` entries in the dx\_root/dx\_node structures are adjusted as
|
|
|
+necessary to fit the dx\_tail into the block. If there is no space for
|
|
|
+the dx\_tail, the user is notified to run e2fsck -D to rebuild the
|
|
|
+directory index (which will ensure that there's space for the checksum.
|
|
|
+The dx\_tail structure is 8 bytes long and looks like this:
|
|
|
+
|
|
|
+.. list-table::
|
|
|
+ :widths: 1 1 1 77
|
|
|
+ :header-rows: 1
|
|
|
+
|
|
|
+ * - Offset
|
|
|
+ - Type
|
|
|
+ - Name
|
|
|
+ - Description
|
|
|
+ * - 0x0
|
|
|
+ - u32
|
|
|
+ - dt\_reserved
|
|
|
+ - Zero.
|
|
|
+ * - 0x4
|
|
|
+ - \_\_le32
|
|
|
+ - dt\_checksum
|
|
|
+ - Checksum of the htree directory block.
|
|
|
+
|
|
|
+The checksum is calculated against the FS UUID, the htree index header
|
|
|
+(dx\_root or dx\_node), all of the htree indices (dx\_entry) that are in
|
|
|
+use, and the tail block (dx\_tail).
|