|
@@ -1,3 +1,5 @@
|
|
|
|
+.. _admin_guide_memory_hotplug:
|
|
|
|
+
|
|
==============
|
|
==============
|
|
Memory Hotplug
|
|
Memory Hotplug
|
|
==============
|
|
==============
|
|
@@ -9,39 +11,19 @@ This document is about memory hotplug including how-to-use and current status.
|
|
Because Memory Hotplug is still under development, contents of this text will
|
|
Because Memory Hotplug is still under development, contents of this text will
|
|
be changed often.
|
|
be changed often.
|
|
|
|
|
|
-.. CONTENTS
|
|
|
|
-
|
|
|
|
- 1. Introduction
|
|
|
|
- 1.1 purpose of memory hotplug
|
|
|
|
- 1.2. Phases of memory hotplug
|
|
|
|
- 1.3. Unit of Memory online/offline operation
|
|
|
|
- 2. Kernel Configuration
|
|
|
|
- 3. sysfs files for memory hotplug
|
|
|
|
- 4. Physical memory hot-add phase
|
|
|
|
- 4.1 Hardware(Firmware) Support
|
|
|
|
- 4.2 Notify memory hot-add event by hand
|
|
|
|
- 5. Logical Memory hot-add phase
|
|
|
|
- 5.1. State of memory
|
|
|
|
- 5.2. How to online memory
|
|
|
|
- 6. Logical memory remove
|
|
|
|
- 6.1 Memory offline and ZONE_MOVABLE
|
|
|
|
- 6.2. How to offline memory
|
|
|
|
- 7. Physical memory remove
|
|
|
|
- 8. Memory hotplug event notifier
|
|
|
|
- 9. Future Work List
|
|
|
|
-
|
|
|
|
|
|
+.. contents:: :local:
|
|
|
|
|
|
.. note::
|
|
.. note::
|
|
|
|
|
|
(1) x86_64's has special implementation for memory hotplug.
|
|
(1) x86_64's has special implementation for memory hotplug.
|
|
This text does not describe it.
|
|
This text does not describe it.
|
|
- (2) This text assumes that sysfs is mounted at /sys.
|
|
|
|
|
|
+ (2) This text assumes that sysfs is mounted at ``/sys``.
|
|
|
|
|
|
|
|
|
|
Introduction
|
|
Introduction
|
|
============
|
|
============
|
|
|
|
|
|
-purpose of memory hotplug
|
|
|
|
|
|
+Purpose of memory hotplug
|
|
-------------------------
|
|
-------------------------
|
|
|
|
|
|
Memory Hotplug allows users to increase/decrease the amount of memory.
|
|
Memory Hotplug allows users to increase/decrease the amount of memory.
|
|
@@ -57,7 +39,6 @@ hardware which supports memory power management.
|
|
|
|
|
|
Linux memory hotplug is designed for both purpose.
|
|
Linux memory hotplug is designed for both purpose.
|
|
|
|
|
|
-
|
|
|
|
Phases of memory hotplug
|
|
Phases of memory hotplug
|
|
------------------------
|
|
------------------------
|
|
|
|
|
|
@@ -92,7 +73,6 @@ phase by hand.
|
|
(However, if you writes udev's hotplug scripts for memory hotplug, these
|
|
(However, if you writes udev's hotplug scripts for memory hotplug, these
|
|
phases can be execute in seamless way.)
|
|
phases can be execute in seamless way.)
|
|
|
|
|
|
-
|
|
|
|
Unit of Memory online/offline operation
|
|
Unit of Memory online/offline operation
|
|
---------------------------------------
|
|
---------------------------------------
|
|
|
|
|
|
@@ -107,10 +87,9 @@ unit upon which memory online/offline operations are to be performed. The
|
|
default size of a memory block is the same as memory section size unless an
|
|
default size of a memory block is the same as memory section size unless an
|
|
architecture specifies otherwise. (see :ref:`memory_hotplug_sysfs_files`.)
|
|
architecture specifies otherwise. (see :ref:`memory_hotplug_sysfs_files`.)
|
|
|
|
|
|
-To determine the size (in bytes) of a memory block please read this file:
|
|
|
|
-
|
|
|
|
-/sys/devices/system/memory/block_size_bytes
|
|
|
|
|
|
+To determine the size (in bytes) of a memory block please read this file::
|
|
|
|
|
|
|
|
+ /sys/devices/system/memory/block_size_bytes
|
|
|
|
|
|
Kernel Configuration
|
|
Kernel Configuration
|
|
====================
|
|
====================
|
|
@@ -119,22 +98,22 @@ To use memory hotplug feature, kernel must be compiled with following
|
|
config options.
|
|
config options.
|
|
|
|
|
|
- For all memory hotplug:
|
|
- For all memory hotplug:
|
|
- - Memory model -> Sparse Memory (CONFIG_SPARSEMEM)
|
|
|
|
- - Allow for memory hot-add (CONFIG_MEMORY_HOTPLUG)
|
|
|
|
|
|
+ - Memory model -> Sparse Memory (``CONFIG_SPARSEMEM``)
|
|
|
|
+ - Allow for memory hot-add (``CONFIG_MEMORY_HOTPLUG``)
|
|
|
|
|
|
- To enable memory removal, the following are also necessary:
|
|
- To enable memory removal, the following are also necessary:
|
|
- - Allow for memory hot remove (CONFIG_MEMORY_HOTREMOVE)
|
|
|
|
- - Page Migration (CONFIG_MIGRATION)
|
|
|
|
|
|
+ - Allow for memory hot remove (``CONFIG_MEMORY_HOTREMOVE``)
|
|
|
|
+ - Page Migration (``CONFIG_MIGRATION``)
|
|
|
|
|
|
- For ACPI memory hotplug, the following are also necessary:
|
|
- For ACPI memory hotplug, the following are also necessary:
|
|
- - Memory hotplug (under ACPI Support menu) (CONFIG_ACPI_HOTPLUG_MEMORY)
|
|
|
|
|
|
+ - Memory hotplug (under ACPI Support menu) (``CONFIG_ACPI_HOTPLUG_MEMORY``)
|
|
- This option can be kernel module.
|
|
- This option can be kernel module.
|
|
|
|
|
|
- As a related configuration, if your box has a feature of NUMA-node hotplug
|
|
- As a related configuration, if your box has a feature of NUMA-node hotplug
|
|
via ACPI, then this option is necessary too.
|
|
via ACPI, then this option is necessary too.
|
|
|
|
|
|
- ACPI0004,PNP0A05 and PNP0A06 Container Driver (under ACPI Support menu)
|
|
- ACPI0004,PNP0A05 and PNP0A06 Container Driver (under ACPI Support menu)
|
|
- (CONFIG_ACPI_CONTAINER).
|
|
|
|
|
|
+ (``CONFIG_ACPI_CONTAINER``).
|
|
|
|
|
|
This option can be kernel module too.
|
|
This option can be kernel module too.
|
|
|
|
|
|
@@ -145,10 +124,11 @@ sysfs files for memory hotplug
|
|
==============================
|
|
==============================
|
|
|
|
|
|
All memory blocks have their device information in sysfs. Each memory block
|
|
All memory blocks have their device information in sysfs. Each memory block
|
|
-is described under /sys/devices/system/memory as:
|
|
|
|
|
|
+is described under ``/sys/devices/system/memory`` as::
|
|
|
|
|
|
/sys/devices/system/memory/memoryXXX
|
|
/sys/devices/system/memory/memoryXXX
|
|
- (XXX is the memory block id.)
|
|
|
|
|
|
+
|
|
|
|
+where XXX is the memory block id.
|
|
|
|
|
|
For the memory block covered by the sysfs directory. It is expected that all
|
|
For the memory block covered by the sysfs directory. It is expected that all
|
|
memory sections in this range are present and no memory holes exist in the
|
|
memory sections in this range are present and no memory holes exist in the
|
|
@@ -157,7 +137,7 @@ the existence of one should not affect the hotplug capabilities of the memory
|
|
block.
|
|
block.
|
|
|
|
|
|
For example, assume 1GiB memory block size. A device for a memory starting at
|
|
For example, assume 1GiB memory block size. A device for a memory starting at
|
|
-0x100000000 is /sys/device/system/memory/memory4::
|
|
|
|
|
|
+0x100000000 is ``/sys/device/system/memory/memory4``::
|
|
|
|
|
|
(0x100000000 / 1Gib = 4)
|
|
(0x100000000 / 1Gib = 4)
|
|
|
|
|
|
@@ -165,11 +145,11 @@ This device covers address range [0x100000000 ... 0x140000000)
|
|
|
|
|
|
Under each memory block, you can see 5 files:
|
|
Under each memory block, you can see 5 files:
|
|
|
|
|
|
-- /sys/devices/system/memory/memoryXXX/phys_index
|
|
|
|
-- /sys/devices/system/memory/memoryXXX/phys_device
|
|
|
|
-- /sys/devices/system/memory/memoryXXX/state
|
|
|
|
-- /sys/devices/system/memory/memoryXXX/removable
|
|
|
|
-- /sys/devices/system/memory/memoryXXX/valid_zones
|
|
|
|
|
|
+- ``/sys/devices/system/memory/memoryXXX/phys_index``
|
|
|
|
+- ``/sys/devices/system/memory/memoryXXX/phys_device``
|
|
|
|
+- ``/sys/devices/system/memory/memoryXXX/state``
|
|
|
|
+- ``/sys/devices/system/memory/memoryXXX/removable``
|
|
|
|
+- ``/sys/devices/system/memory/memoryXXX/valid_zones``
|
|
|
|
|
|
=================== ============================================================
|
|
=================== ============================================================
|
|
``phys_index`` read-only and contains memory block id, same as XXX.
|
|
``phys_index`` read-only and contains memory block id, same as XXX.
|
|
@@ -207,13 +187,15 @@ Under each memory block, you can see 5 files:
|
|
These directories/files appear after physical memory hotplug phase.
|
|
These directories/files appear after physical memory hotplug phase.
|
|
|
|
|
|
If CONFIG_NUMA is enabled the memoryXXX/ directories can also be accessed
|
|
If CONFIG_NUMA is enabled the memoryXXX/ directories can also be accessed
|
|
-via symbolic links located in the /sys/devices/system/node/node* directories.
|
|
|
|
|
|
+via symbolic links located in the ``/sys/devices/system/node/node*`` directories.
|
|
|
|
+
|
|
|
|
+For example::
|
|
|
|
|
|
-For example:
|
|
|
|
-/sys/devices/system/node/node0/memory9 -> ../../memory/memory9
|
|
|
|
|
|
+ /sys/devices/system/node/node0/memory9 -> ../../memory/memory9
|
|
|
|
|
|
-A backlink will also be created:
|
|
|
|
-/sys/devices/system/memory/memory9/node0 -> ../../node/node0
|
|
|
|
|
|
+A backlink will also be created::
|
|
|
|
+
|
|
|
|
+ /sys/devices/system/memory/memory9/node0 -> ../../node/node0
|
|
|
|
|
|
.. _memory_hotplug_physical_mem:
|
|
.. _memory_hotplug_physical_mem:
|
|
|
|
|
|
@@ -240,7 +222,6 @@ If firmware supports NUMA-node hotplug, and defines an object _HID "ACPI0004",
|
|
calls hotplug code for all of objects which are defined in it.
|
|
calls hotplug code for all of objects which are defined in it.
|
|
If memory device is found, memory hotplug code will be called.
|
|
If memory device is found, memory hotplug code will be called.
|
|
|
|
|
|
-
|
|
|
|
Notify memory hot-add event by hand
|
|
Notify memory hot-add event by hand
|
|
-----------------------------------
|
|
-----------------------------------
|
|
|
|
|
|
@@ -251,8 +232,9 @@ CONFIG_ARCH_MEMORY_PROBE and can be configured on powerpc, sh, and x86
|
|
if hotplug is supported, although for x86 this should be handled by ACPI
|
|
if hotplug is supported, although for x86 this should be handled by ACPI
|
|
notification.
|
|
notification.
|
|
|
|
|
|
-Probe interface is located at
|
|
|
|
-/sys/devices/system/memory/probe
|
|
|
|
|
|
+Probe interface is located at::
|
|
|
|
+
|
|
|
|
+ /sys/devices/system/memory/probe
|
|
|
|
|
|
You can tell the physical address of new memory to the kernel by::
|
|
You can tell the physical address of new memory to the kernel by::
|
|
|
|
|
|
@@ -263,7 +245,6 @@ memory_block_size] memory range is hot-added. In this case, hotplug script is
|
|
not called (in current implementation). You'll have to online memory by
|
|
not called (in current implementation). You'll have to online memory by
|
|
yourself. Please see :ref:`memory_hotplug_how_to_online_memory`.
|
|
yourself. Please see :ref:`memory_hotplug_how_to_online_memory`.
|
|
|
|
|
|
-
|
|
|
|
Logical Memory hot-add phase
|
|
Logical Memory hot-add phase
|
|
============================
|
|
============================
|
|
|
|
|
|
@@ -301,7 +282,7 @@ This sets a global policy and impacts all memory blocks that will subsequently
|
|
be hotplugged. Currently offline blocks keep their state. It is possible, under
|
|
be hotplugged. Currently offline blocks keep their state. It is possible, under
|
|
certain circumstances, that some memory blocks will be added but will fail to
|
|
certain circumstances, that some memory blocks will be added but will fail to
|
|
online. User space tools can check their "state" files
|
|
online. User space tools can check their "state" files
|
|
-(/sys/devices/system/memory/memoryXXX/state) and try to online them manually.
|
|
|
|
|
|
+(``/sys/devices/system/memory/memoryXXX/state``) and try to online them manually.
|
|
|
|
|
|
If the automatic onlining wasn't requested, failed, or some memory block was
|
|
If the automatic onlining wasn't requested, failed, or some memory block was
|
|
offlined it is possible to change the individual block's state by writing to the
|
|
offlined it is possible to change the individual block's state by writing to the
|
|
@@ -334,8 +315,6 @@ available memory will be increased.
|
|
|
|
|
|
This may be changed in future.
|
|
This may be changed in future.
|
|
|
|
|
|
-
|
|
|
|
-
|
|
|
|
Logical memory remove
|
|
Logical memory remove
|
|
=====================
|
|
=====================
|
|
|
|
|
|
@@ -413,88 +392,6 @@ Need more implementation yet....
|
|
- Notification completion of remove works by OS to firmware.
|
|
- Notification completion of remove works by OS to firmware.
|
|
- Guard from remove if not yet.
|
|
- Guard from remove if not yet.
|
|
|
|
|
|
-Memory hotplug event notifier
|
|
|
|
-=============================
|
|
|
|
-
|
|
|
|
-Hotplugging events are sent to a notification queue.
|
|
|
|
-
|
|
|
|
-There are six types of notification defined in include/linux/memory.h:
|
|
|
|
-
|
|
|
|
-MEM_GOING_ONLINE
|
|
|
|
- Generated before new memory becomes available in order to be able to
|
|
|
|
- prepare subsystems to handle memory. The page allocator is still unable
|
|
|
|
- to allocate from the new memory.
|
|
|
|
-
|
|
|
|
-MEM_CANCEL_ONLINE
|
|
|
|
- Generated if MEMORY_GOING_ONLINE fails.
|
|
|
|
-
|
|
|
|
-MEM_ONLINE
|
|
|
|
- Generated when memory has successfully brought online. The callback may
|
|
|
|
- allocate pages from the new memory.
|
|
|
|
-
|
|
|
|
-MEM_GOING_OFFLINE
|
|
|
|
- Generated to begin the process of offlining memory. Allocations are no
|
|
|
|
- longer possible from the memory but some of the memory to be offlined
|
|
|
|
- is still in use. The callback can be used to free memory known to a
|
|
|
|
- subsystem from the indicated memory block.
|
|
|
|
-
|
|
|
|
-MEM_CANCEL_OFFLINE
|
|
|
|
- Generated if MEMORY_GOING_OFFLINE fails. Memory is available again from
|
|
|
|
- the memory block that we attempted to offline.
|
|
|
|
-
|
|
|
|
-MEM_OFFLINE
|
|
|
|
- Generated after offlining memory is complete.
|
|
|
|
-
|
|
|
|
-A callback routine can be registered by calling::
|
|
|
|
-
|
|
|
|
- hotplug_memory_notifier(callback_func, priority)
|
|
|
|
-
|
|
|
|
-Callback functions with higher values of priority are called before callback
|
|
|
|
-functions with lower values.
|
|
|
|
-
|
|
|
|
-A callback function must have the following prototype::
|
|
|
|
-
|
|
|
|
- int callback_func(
|
|
|
|
- struct notifier_block *self, unsigned long action, void *arg);
|
|
|
|
-
|
|
|
|
-The first argument of the callback function (self) is a pointer to the block
|
|
|
|
-of the notifier chain that points to the callback function itself.
|
|
|
|
-The second argument (action) is one of the event types described above.
|
|
|
|
-The third argument (arg) passes a pointer of struct memory_notify::
|
|
|
|
-
|
|
|
|
- struct memory_notify {
|
|
|
|
- unsigned long start_pfn;
|
|
|
|
- unsigned long nr_pages;
|
|
|
|
- int status_change_nid_normal;
|
|
|
|
- int status_change_nid_high;
|
|
|
|
- int status_change_nid;
|
|
|
|
- }
|
|
|
|
-
|
|
|
|
-- start_pfn is start_pfn of online/offline memory.
|
|
|
|
-- nr_pages is # of pages of online/offline memory.
|
|
|
|
-- status_change_nid_normal is set node id when N_NORMAL_MEMORY of nodemask
|
|
|
|
- is (will be) set/clear, if this is -1, then nodemask status is not changed.
|
|
|
|
-- status_change_nid_high is set node id when N_HIGH_MEMORY of nodemask
|
|
|
|
- is (will be) set/clear, if this is -1, then nodemask status is not changed.
|
|
|
|
-- status_change_nid is set node id when N_MEMORY of nodemask is (will be)
|
|
|
|
- set/clear. It means a new(memoryless) node gets new memory by online and a
|
|
|
|
- node loses all memory. If this is -1, then nodemask status is not changed.
|
|
|
|
-
|
|
|
|
- If status_changed_nid* >= 0, callback should create/discard structures for the
|
|
|
|
- node if necessary.
|
|
|
|
-
|
|
|
|
-The callback routine shall return one of the values
|
|
|
|
-NOTIFY_DONE, NOTIFY_OK, NOTIFY_BAD, NOTIFY_STOP
|
|
|
|
-defined in include/linux/notifier.h
|
|
|
|
-
|
|
|
|
-NOTIFY_DONE and NOTIFY_OK have no effect on the further processing.
|
|
|
|
-
|
|
|
|
-NOTIFY_BAD is used as response to the MEM_GOING_ONLINE, MEM_GOING_OFFLINE,
|
|
|
|
-MEM_ONLINE, or MEM_OFFLINE action to cancel hotplugging. It stops
|
|
|
|
-further processing of the notification queue.
|
|
|
|
-
|
|
|
|
-NOTIFY_STOP stops further processing of the notification queue.
|
|
|
|
-
|
|
|
|
Future Work
|
|
Future Work
|
|
===========
|
|
===========
|
|
|
|
|