|
@@ -53,10 +53,14 @@ v1 is available under Documentation/cgroup-v1/.
|
|
|
5-3-2. Writeback
|
|
|
5-4. PID
|
|
|
5-4-1. PID Interface Files
|
|
|
- 5-5. RDMA
|
|
|
- 5-5-1. RDMA Interface Files
|
|
|
- 5-6. Misc
|
|
|
- 5-6-1. perf_event
|
|
|
+ 5-5. Device
|
|
|
+ 5-6. RDMA
|
|
|
+ 5-6-1. RDMA Interface Files
|
|
|
+ 5-7. Misc
|
|
|
+ 5-7-1. perf_event
|
|
|
+ 5-N. Non-normative information
|
|
|
+ 5-N-1. CPU controller root cgroup process behaviour
|
|
|
+ 5-N-2. IO controller root cgroup process behaviour
|
|
|
6. Namespace
|
|
|
6-1. Basics
|
|
|
6-2. The Root and Views
|
|
@@ -279,7 +283,7 @@ thread mode, the following conditions must be met.
|
|
|
exempt from this requirement.
|
|
|
|
|
|
Topology-wise, a cgroup can be in an invalid state. Please consider
|
|
|
-the following toplogy::
|
|
|
+the following topology::
|
|
|
|
|
|
A (threaded domain) - B (threaded) - C (domain, just created)
|
|
|
|
|
@@ -420,7 +424,9 @@ The root cgroup is exempt from this restriction. Root contains
|
|
|
processes and anonymous resource consumption which can't be associated
|
|
|
with any other cgroups and requires special treatment from most
|
|
|
controllers. How resource consumption in the root cgroup is governed
|
|
|
-is up to each controller.
|
|
|
+is up to each controller (for more information on this topic please
|
|
|
+refer to the Non-normative information section in the Controllers
|
|
|
+chapter).
|
|
|
|
|
|
Note that the restriction doesn't get in the way if there is no
|
|
|
enabled controller in the cgroup's "cgroup.subtree_control". This is
|
|
@@ -1063,10 +1069,10 @@ PAGE_SIZE multiple when read back.
|
|
|
reached the limit and allocation was about to fail.
|
|
|
|
|
|
Depending on context result could be invocation of OOM
|
|
|
- killer and retrying allocation or failing alloction.
|
|
|
+ killer and retrying allocation or failing allocation.
|
|
|
|
|
|
Failed allocation in its turn could be returned into
|
|
|
- userspace as -ENOMEM or siletly ignored in cases like
|
|
|
+ userspace as -ENOMEM or silently ignored in cases like
|
|
|
disk readahead. For now OOM in memory cgroup kills
|
|
|
tasks iff shortage has happened inside page fault.
|
|
|
|
|
@@ -1191,7 +1197,7 @@ PAGE_SIZE multiple when read back.
|
|
|
cgroups. The default is "max".
|
|
|
|
|
|
Swap usage hard limit. If a cgroup's swap usage reaches this
|
|
|
- limit, anonymous meomry of the cgroup will not be swapped out.
|
|
|
+ limit, anonymous memory of the cgroup will not be swapped out.
|
|
|
|
|
|
|
|
|
Usage Guidelines
|
|
@@ -1429,6 +1435,30 @@ through fork() or clone(). These will return -EAGAIN if the creation
|
|
|
of a new process would cause a cgroup policy to be violated.
|
|
|
|
|
|
|
|
|
+Device controller
|
|
|
+-----------------
|
|
|
+
|
|
|
+Device controller manages access to device files. It includes both
|
|
|
+creation of new device files (using mknod), and access to the
|
|
|
+existing device files.
|
|
|
+
|
|
|
+Cgroup v2 device controller has no interface files and is implemented
|
|
|
+on top of cgroup BPF. To control access to device files, a user may
|
|
|
+create bpf programs of the BPF_CGROUP_DEVICE type and attach them
|
|
|
+to cgroups. On an attempt to access a device file, corresponding
|
|
|
+BPF programs will be executed, and depending on the return value
|
|
|
+the attempt will succeed or fail with -EPERM.
|
|
|
+
|
|
|
+A BPF_CGROUP_DEVICE program takes a pointer to the bpf_cgroup_dev_ctx
|
|
|
+structure, which describes the device access attempt: access type
|
|
|
+(mknod/read/write) and device (type, major and minor numbers).
|
|
|
+If the program returns 0, the attempt fails with -EPERM, otherwise
|
|
|
+it succeeds.
|
|
|
+
|
|
|
+An example of BPF_CGROUP_DEVICE program may be found in the kernel
|
|
|
+source tree in the tools/testing/selftests/bpf/dev_cgroup.c file.
|
|
|
+
|
|
|
+
|
|
|
RDMA
|
|
|
----
|
|
|
|
|
@@ -1481,6 +1511,35 @@ always be filtered by cgroup v2 path. The controller can still be
|
|
|
moved to a legacy hierarchy after v2 hierarchy is populated.
|
|
|
|
|
|
|
|
|
+Non-normative information
|
|
|
+-------------------------
|
|
|
+
|
|
|
+This section contains information that isn't considered to be a part of
|
|
|
+the stable kernel API and so is subject to change.
|
|
|
+
|
|
|
+
|
|
|
+CPU controller root cgroup process behaviour
|
|
|
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
+
|
|
|
+When distributing CPU cycles in the root cgroup each thread in this
|
|
|
+cgroup is treated as if it was hosted in a separate child cgroup of the
|
|
|
+root cgroup. This child cgroup weight is dependent on its thread nice
|
|
|
+level.
|
|
|
+
|
|
|
+For details of this mapping see sched_prio_to_weight array in
|
|
|
+kernel/sched/core.c file (values from this array should be scaled
|
|
|
+appropriately so the neutral - nice 0 - value is 100 instead of 1024).
|
|
|
+
|
|
|
+
|
|
|
+IO controller root cgroup process behaviour
|
|
|
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
+
|
|
|
+Root cgroup processes are hosted in an implicit leaf child node.
|
|
|
+When distributing IO resources this implicit child node is taken into
|
|
|
+account as if it was a normal child cgroup of the root cgroup with a
|
|
|
+weight value of 200.
|
|
|
+
|
|
|
+
|
|
|
Namespace
|
|
|
=========
|
|
|
|