summaryrefslogtreecommitdiff
path: root/Documentation/cgroups
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation/cgroups')
-rw-r--r--Documentation/cgroups/blkio-controller.txt31
-rw-r--r--Documentation/cgroups/cgroups.txt60
-rw-r--r--Documentation/cgroups/cpuacct.txt21
-rw-r--r--Documentation/cgroups/cpusets.txt28
-rw-r--r--Documentation/cgroups/devices.txt6
-rw-r--r--Documentation/cgroups/freezer-subsystem.txt20
-rw-r--r--Documentation/cgroups/memory.txt58
7 files changed, 129 insertions, 95 deletions
diff --git a/Documentation/cgroups/blkio-controller.txt b/Documentation/cgroups/blkio-controller.txt
index 465351d4cf85..cd45c8ea7463 100644
--- a/Documentation/cgroups/blkio-controller.txt
+++ b/Documentation/cgroups/blkio-controller.txt
@@ -28,16 +28,19 @@ cgroups. Here is what you can do.
- Enable group scheduling in CFQ
CONFIG_CFQ_GROUP_IOSCHED=y
-- Compile and boot into kernel and mount IO controller (blkio).
+- Compile and boot into kernel and mount IO controller (blkio); see
+ cgroups.txt, Why are cgroups needed?.
- mount -t cgroup -o blkio none /cgroup
+ mount -t tmpfs cgroup_root /sys/fs/cgroup
+ mkdir /sys/fs/cgroup/blkio
+ mount -t cgroup -o blkio none /sys/fs/cgroup/blkio
- Create two cgroups
- mkdir -p /cgroup/test1/ /cgroup/test2
+ mkdir -p /sys/fs/cgroup/blkio/test1/ /sys/fs/cgroup/blkio/test2
- Set weights of group test1 and test2
- echo 1000 > /cgroup/test1/blkio.weight
- echo 500 > /cgroup/test2/blkio.weight
+ echo 1000 > /sys/fs/cgroup/blkio/test1/blkio.weight
+ echo 500 > /sys/fs/cgroup/blkio/test2/blkio.weight
- Create two same size files (say 512MB each) on same disk (file1, file2) and
launch two dd threads in different cgroup to read those files.
@@ -46,12 +49,12 @@ cgroups. Here is what you can do.
echo 3 > /proc/sys/vm/drop_caches
dd if=/mnt/sdb/zerofile1 of=/dev/null &
- echo $! > /cgroup/test1/tasks
- cat /cgroup/test1/tasks
+ echo $! > /sys/fs/cgroup/blkio/test1/tasks
+ cat /sys/fs/cgroup/blkio/test1/tasks
dd if=/mnt/sdb/zerofile2 of=/dev/null &
- echo $! > /cgroup/test2/tasks
- cat /cgroup/test2/tasks
+ echo $! > /sys/fs/cgroup/blkio/test2/tasks
+ cat /sys/fs/cgroup/blkio/test2/tasks
- At macro level, first dd should finish first. To get more precise data, keep
on looking at (with the help of script), at blkio.disk_time and
@@ -68,13 +71,13 @@ Throttling/Upper Limit policy
- Enable throttling in block layer
CONFIG_BLK_DEV_THROTTLING=y
-- Mount blkio controller
- mount -t cgroup -o blkio none /cgroup/blkio
+- Mount blkio controller (see cgroups.txt, Why are cgroups needed?)
+ mount -t cgroup -o blkio none /sys/fs/cgroup/blkio
- Specify a bandwidth rate on particular device for root group. The format
for policy is "<major>:<minor> <byes_per_second>".
- echo "8:16 1048576" > /cgroup/blkio/blkio.read_bps_device
+ echo "8:16 1048576" > /sys/fs/cgroup/blkio/blkio.read_bps_device
Above will put a limit of 1MB/second on reads happening for root group
on device having major/minor number 8:16.
@@ -108,7 +111,7 @@ Hierarchical Cgroups
CFQ and throttling will practically treat all groups at same level.
pivot
- / | \ \
+ / / \ \
root test1 test2 test3
Down the line we can implement hierarchical accounting/control support
@@ -149,7 +152,7 @@ Proportional weight policy files
Following is the format.
- #echo dev_maj:dev_minor weight > /path/to/cgroup/blkio.weight_device
+ # echo dev_maj:dev_minor weight > blkio.weight_device
Configure weight=300 on /dev/sdb (8:16) in this cgroup
# echo 8:16 300 > blkio.weight_device
# cat blkio.weight_device
diff --git a/Documentation/cgroups/cgroups.txt b/Documentation/cgroups/cgroups.txt
index 0ed99f08f1f3..cd67e90003c0 100644
--- a/Documentation/cgroups/cgroups.txt
+++ b/Documentation/cgroups/cgroups.txt
@@ -138,11 +138,11 @@ With the ability to classify tasks differently for different resources
the admin can easily set up a script which receives exec notifications
and depending on who is launching the browser he can
- # echo browser_pid > /mnt/<restype>/<userclass>/tasks
+ # echo browser_pid > /sys/fs/cgroup/<restype>/<userclass>/tasks
With only a single hierarchy, he now would potentially have to create
a separate cgroup for every browser launched and associate it with
-approp network and other resource class. This may lead to
+appropriate network and other resource class. This may lead to
proliferation of such cgroups.
Also lets say that the administrator would like to give enhanced network
@@ -153,9 +153,9 @@ apps enhanced CPU power,
With ability to write pids directly to resource classes, it's just a
matter of :
- # echo pid > /mnt/network/<new_class>/tasks
+ # echo pid > /sys/fs/cgroup/network/<new_class>/tasks
(after some time)
- # echo pid > /mnt/network/<orig_class>/tasks
+ # echo pid > /sys/fs/cgroup/network/<orig_class>/tasks
Without this ability, he would have to split the cgroup into
multiple separate ones and then associate the new cgroups with the
@@ -310,21 +310,24 @@ subsystem, this is the case for the cpuset.
To start a new job that is to be contained within a cgroup, using
the "cpuset" cgroup subsystem, the steps are something like:
- 1) mkdir /dev/cgroup
- 2) mount -t cgroup -ocpuset cpuset /dev/cgroup
- 3) Create the new cgroup by doing mkdir's and write's (or echo's) in
- the /dev/cgroup virtual file system.
- 4) Start a task that will be the "founding father" of the new job.
- 5) Attach that task to the new cgroup by writing its pid to the
- /dev/cgroup tasks file for that cgroup.
- 6) fork, exec or clone the job tasks from this founding father task.
+ 1) mount -t tmpfs cgroup_root /sys/fs/cgroup
+ 2) mkdir /sys/fs/cgroup/cpuset
+ 3) mount -t cgroup -ocpuset cpuset /sys/fs/cgroup/cpuset
+ 4) Create the new cgroup by doing mkdir's and write's (or echo's) in
+ the /sys/fs/cgroup virtual file system.
+ 5) Start a task that will be the "founding father" of the new job.
+ 6) Attach that task to the new cgroup by writing its pid to the
+ /sys/fs/cgroup/cpuset/tasks file for that cgroup.
+ 7) fork, exec or clone the job tasks from this founding father task.
For example, the following sequence of commands will setup a cgroup
named "Charlie", containing just CPUs 2 and 3, and Memory Node 1,
and then start a subshell 'sh' in that cgroup:
- mount -t cgroup cpuset -ocpuset /dev/cgroup
- cd /dev/cgroup
+ mount -t tmpfs cgroup_root /sys/fs/cgroup
+ mkdir /sys/fs/cgroup/cpuset
+ mount -t cgroup cpuset -ocpuset /sys/fs/cgroup/cpuset
+ cd /sys/fs/cgroup/cpuset
mkdir Charlie
cd Charlie
/bin/echo 2-3 > cpuset.cpus
@@ -345,7 +348,7 @@ Creating, modifying, using the cgroups can be done through the cgroup
virtual filesystem.
To mount a cgroup hierarchy with all available subsystems, type:
-# mount -t cgroup xxx /dev/cgroup
+# mount -t cgroup xxx /sys/fs/cgroup
The "xxx" is not interpreted by the cgroup code, but will appear in
/proc/mounts so may be any useful identifying string that you like.
@@ -354,23 +357,32 @@ Note: Some subsystems do not work without some user input first. For instance,
if cpusets are enabled the user will have to populate the cpus and mems files
for each new cgroup created before that group can be used.
+As explained in section `1.2 Why are cgroups needed?' you should create
+different hierarchies of cgroups for each single resource or group of
+resources you want to control. Therefore, you should mount a tmpfs on
+/sys/fs/cgroup and create directories for each cgroup resource or resource
+group.
+
+# mount -t tmpfs cgroup_root /sys/fs/cgroup
+# mkdir /sys/fs/cgroup/rg1
+
To mount a cgroup hierarchy with just the cpuset and memory
subsystems, type:
-# mount -t cgroup -o cpuset,memory hier1 /dev/cgroup
+# mount -t cgroup -o cpuset,memory hier1 /sys/fs/cgroup/rg1
To change the set of subsystems bound to a mounted hierarchy, just
remount with different options:
-# mount -o remount,cpuset,blkio hier1 /dev/cgroup
+# mount -o remount,cpuset,blkio hier1 /sys/fs/cgroup/rg1
Now memory is removed from the hierarchy and blkio is added.
Note this will add blkio to the hierarchy but won't remove memory or
cpuset, because the new options are appended to the old ones:
-# mount -o remount,blkio /dev/cgroup
+# mount -o remount,blkio /sys/fs/cgroup/rg1
To Specify a hierarchy's release_agent:
# mount -t cgroup -o cpuset,release_agent="/sbin/cpuset_release_agent" \
- xxx /dev/cgroup
+ xxx /sys/fs/cgroup/rg1
Note that specifying 'release_agent' more than once will return failure.
@@ -379,17 +391,17 @@ when the hierarchy consists of a single (root) cgroup. Supporting
the ability to arbitrarily bind/unbind subsystems from an existing
cgroup hierarchy is intended to be implemented in the future.
-Then under /dev/cgroup you can find a tree that corresponds to the
-tree of the cgroups in the system. For instance, /dev/cgroup
+Then under /sys/fs/cgroup/rg1 you can find a tree that corresponds to the
+tree of the cgroups in the system. For instance, /sys/fs/cgroup/rg1
is the cgroup that holds the whole system.
If you want to change the value of release_agent:
-# echo "/sbin/new_release_agent" > /dev/cgroup/release_agent
+# echo "/sbin/new_release_agent" > /sys/fs/cgroup/rg1/release_agent
It can also be changed via remount.
-If you want to create a new cgroup under /dev/cgroup:
-# cd /dev/cgroup
+If you want to create a new cgroup under /sys/fs/cgroup/rg1:
+# cd /sys/fs/cgroup/rg1
# mkdir my_cgroup
Now you want to do something with this cgroup.
diff --git a/Documentation/cgroups/cpuacct.txt b/Documentation/cgroups/cpuacct.txt
index 8b930946c52a..9ad85df4b983 100644
--- a/Documentation/cgroups/cpuacct.txt
+++ b/Documentation/cgroups/cpuacct.txt
@@ -10,26 +10,25 @@ directly present in its group.
Accounting groups can be created by first mounting the cgroup filesystem.
-# mkdir /cgroups
-# mount -t cgroup -ocpuacct none /cgroups
-
-With the above step, the initial or the parent accounting group
-becomes visible at /cgroups. At bootup, this group includes all the
-tasks in the system. /cgroups/tasks lists the tasks in this cgroup.
-/cgroups/cpuacct.usage gives the CPU time (in nanoseconds) obtained by
-this group which is essentially the CPU time obtained by all the tasks
+# mount -t cgroup -ocpuacct none /sys/fs/cgroup
+
+With the above step, the initial or the parent accounting group becomes
+visible at /sys/fs/cgroup. At bootup, this group includes all the tasks in
+the system. /sys/fs/cgroup/tasks lists the tasks in this cgroup.
+/sys/fs/cgroup/cpuacct.usage gives the CPU time (in nanoseconds) obtained
+by this group which is essentially the CPU time obtained by all the tasks
in the system.
-New accounting groups can be created under the parent group /cgroups.
+New accounting groups can be created under the parent group /sys/fs/cgroup.
-# cd /cgroups
+# cd /sys/fs/cgroup
# mkdir g1
# echo $$ > g1
The above steps create a new group g1 and move the current shell
process (bash) into it. CPU time consumed by this bash and its children
can be obtained from g1/cpuacct.usage and the same is accumulated in
-/cgroups/cpuacct.usage also.
+/sys/fs/cgroup/cpuacct.usage also.
cpuacct.stat file lists a few statistics which further divide the
CPU time obtained by the cgroup into user and system times. Currently
diff --git a/Documentation/cgroups/cpusets.txt b/Documentation/cgroups/cpusets.txt
index 98a30829af7a..5b0d78e55ccc 100644
--- a/Documentation/cgroups/cpusets.txt
+++ b/Documentation/cgroups/cpusets.txt
@@ -661,21 +661,21 @@ than stress the kernel.
To start a new job that is to be contained within a cpuset, the steps are:
- 1) mkdir /dev/cpuset
- 2) mount -t cgroup -ocpuset cpuset /dev/cpuset
+ 1) mkdir /sys/fs/cgroup/cpuset
+ 2) mount -t cgroup -ocpuset cpuset /sys/fs/cgroup/cpuset
3) Create the new cpuset by doing mkdir's and write's (or echo's) in
- the /dev/cpuset virtual file system.
+ the /sys/fs/cgroup/cpuset virtual file system.
4) Start a task that will be the "founding father" of the new job.
5) Attach that task to the new cpuset by writing its pid to the
- /dev/cpuset tasks file for that cpuset.
+ /sys/fs/cgroup/cpuset tasks file for that cpuset.
6) fork, exec or clone the job tasks from this founding father task.
For example, the following sequence of commands will setup a cpuset
named "Charlie", containing just CPUs 2 and 3, and Memory Node 1,
and then start a subshell 'sh' in that cpuset:
- mount -t cgroup -ocpuset cpuset /dev/cpuset
- cd /dev/cpuset
+ mount -t cgroup -ocpuset cpuset /sys/fs/cgroup/cpuset
+ cd /sys/fs/cgroup/cpuset
mkdir Charlie
cd Charlie
/bin/echo 2-3 > cpuset.cpus
@@ -710,14 +710,14 @@ Creating, modifying, using the cpusets can be done through the cpuset
virtual filesystem.
To mount it, type:
-# mount -t cgroup -o cpuset cpuset /dev/cpuset
+# mount -t cgroup -o cpuset cpuset /sys/fs/cgroup/cpuset
-Then under /dev/cpuset you can find a tree that corresponds to the
-tree of the cpusets in the system. For instance, /dev/cpuset
+Then under /sys/fs/cgroup/cpuset you can find a tree that corresponds to the
+tree of the cpusets in the system. For instance, /sys/fs/cgroup/cpuset
is the cpuset that holds the whole system.
-If you want to create a new cpuset under /dev/cpuset:
-# cd /dev/cpuset
+If you want to create a new cpuset under /sys/fs/cgroup/cpuset:
+# cd /sys/fs/cgroup/cpuset
# mkdir my_cpuset
Now you want to do something with this cpuset.
@@ -765,12 +765,12 @@ wrapper around the cgroup filesystem.
The command
-mount -t cpuset X /dev/cpuset
+mount -t cpuset X /sys/fs/cgroup/cpuset
is equivalent to
-mount -t cgroup -ocpuset,noprefix X /dev/cpuset
-echo "/sbin/cpuset_release_agent" > /dev/cpuset/release_agent
+mount -t cgroup -ocpuset,noprefix X /sys/fs/cgroup/cpuset
+echo "/sbin/cpuset_release_agent" > /sys/fs/cgroup/cpuset/release_agent
2.2 Adding/removing cpus
------------------------
diff --git a/Documentation/cgroups/devices.txt b/Documentation/cgroups/devices.txt
index 57ca4c89fe5c..16624a7f8222 100644
--- a/Documentation/cgroups/devices.txt
+++ b/Documentation/cgroups/devices.txt
@@ -22,16 +22,16 @@ removed from the child(ren).
An entry is added using devices.allow, and removed using
devices.deny. For instance
- echo 'c 1:3 mr' > /cgroups/1/devices.allow
+ echo 'c 1:3 mr' > /sys/fs/cgroup/1/devices.allow
allows cgroup 1 to read and mknod the device usually known as
/dev/null. Doing
- echo a > /cgroups/1/devices.deny
+ echo a > /sys/fs/cgroup/1/devices.deny
will remove the default 'a *:* rwm' entry. Doing
- echo a > /cgroups/1/devices.allow
+ echo a > /sys/fs/cgroup/1/devices.allow
will add the 'a *:* rwm' entry to the whitelist.
diff --git a/Documentation/cgroups/freezer-subsystem.txt b/Documentation/cgroups/freezer-subsystem.txt
index 41f37fea1276..c21d77742a07 100644
--- a/Documentation/cgroups/freezer-subsystem.txt
+++ b/Documentation/cgroups/freezer-subsystem.txt
@@ -59,28 +59,28 @@ is non-freezable.
* Examples of usage :
- # mkdir /containers
- # mount -t cgroup -ofreezer freezer /containers
- # mkdir /containers/0
- # echo $some_pid > /containers/0/tasks
+ # mkdir /sys/fs/cgroup/freezer
+ # mount -t cgroup -ofreezer freezer /sys/fs/cgroup/freezer
+ # mkdir /sys/fs/cgroup/freezer/0
+ # echo $some_pid > /sys/fs/cgroup/freezer/0/tasks
to get status of the freezer subsystem :
- # cat /containers/0/freezer.state
+ # cat /sys/fs/cgroup/freezer/0/freezer.state
THAWED
to freeze all tasks in the container :
- # echo FROZEN > /containers/0/freezer.state
- # cat /containers/0/freezer.state
+ # echo FROZEN > /sys/fs/cgroup/freezer/0/freezer.state
+ # cat /sys/fs/cgroup/freezer/0/freezer.state
FREEZING
- # cat /containers/0/freezer.state
+ # cat /sys/fs/cgroup/freezer/0/freezer.state
FROZEN
to unfreeze all tasks in the container :
- # echo THAWED > /containers/0/freezer.state
- # cat /containers/0/freezer.state
+ # echo THAWED > /sys/fs/cgroup/freezer/0/freezer.state
+ # cat /sys/fs/cgroup/freezer/0/freezer.state
THAWED
This is the basic mechanism which should do the right thing for user space task
diff --git a/Documentation/cgroups/memory.txt b/Documentation/cgroups/memory.txt
index 7c163477fcd8..06eb6d957c83 100644
--- a/Documentation/cgroups/memory.txt
+++ b/Documentation/cgroups/memory.txt
@@ -1,8 +1,8 @@
Memory Resource Controller
-NOTE: The Memory Resource Controller has been generically been referred
- to as the memory controller in this document. Do not confuse memory
- controller used here with the memory controller that is used in hardware.
+NOTE: The Memory Resource Controller has generically been referred to as the
+ memory controller in this document. Do not confuse memory controller
+ used here with the memory controller that is used in hardware.
(For editors)
In this document:
@@ -70,6 +70,7 @@ Brief summary of control files.
(See sysctl's vm.swappiness)
memory.move_charge_at_immigrate # set/show controls of moving charges
memory.oom_control # set/show oom controls.
+ memory.numa_stat # show the number of memory usage per numa node
1. History
@@ -181,7 +182,7 @@ behind this approach is that a cgroup that aggressively uses a shared
page will eventually get charged for it (once it is uncharged from
the cgroup that brought it in -- this will happen on memory pressure).
-Exception: If CONFIG_CGROUP_CGROUP_MEM_RES_CTLR_SWAP is not used..
+Exception: If CONFIG_CGROUP_CGROUP_MEM_RES_CTLR_SWAP is not used.
When you do swapoff and make swapped-out pages of shmem(tmpfs) to
be backed into memory in force, charges for pages are accounted against the
caller of swapoff rather than the users of shmem.
@@ -213,7 +214,7 @@ affecting global LRU, memory+swap limit is better than just limiting swap from
OS point of view.
* What happens when a cgroup hits memory.memsw.limit_in_bytes
-When a cgroup his memory.memsw.limit_in_bytes, it's useless to do swap-out
+When a cgroup hits memory.memsw.limit_in_bytes, it's useless to do swap-out
in this cgroup. Then, swap-out will not be done by cgroup routine and file
caches are dropped. But as mentioned above, global LRU can do swapout memory
from it for sanity of the system's memory management state. You can't forbid
@@ -263,16 +264,17 @@ b. Enable CONFIG_RESOURCE_COUNTERS
c. Enable CONFIG_CGROUP_MEM_RES_CTLR
d. Enable CONFIG_CGROUP_MEM_RES_CTLR_SWAP (to use swap extension)
-1. Prepare the cgroups
-# mkdir -p /cgroups
-# mount -t cgroup none /cgroups -o memory
+1. Prepare the cgroups (see cgroups.txt, Why are cgroups needed?)
+# mount -t tmpfs none /sys/fs/cgroup
+# mkdir /sys/fs/cgroup/memory
+# mount -t cgroup none /sys/fs/cgroup/memory -o memory
2. Make the new group and move bash into it
-# mkdir /cgroups/0
-# echo $$ > /cgroups/0/tasks
+# mkdir /sys/fs/cgroup/memory/0
+# echo $$ > /sys/fs/cgroup/memory/0/tasks
Since now we're in the 0 cgroup, we can alter the memory limit:
-# echo 4M > /cgroups/0/memory.limit_in_bytes
+# echo 4M > /sys/fs/cgroup/memory/0/memory.limit_in_bytes
NOTE: We can use a suffix (k, K, m, M, g or G) to indicate values in kilo,
mega or gigabytes. (Here, Kilo, Mega, Giga are Kibibytes, Mebibytes, Gibibytes.)
@@ -280,11 +282,11 @@ mega or gigabytes. (Here, Kilo, Mega, Giga are Kibibytes, Mebibytes, Gibibytes.)
NOTE: We can write "-1" to reset the *.limit_in_bytes(unlimited).
NOTE: We cannot set limits on the root cgroup any more.
-# cat /cgroups/0/memory.limit_in_bytes
+# cat /sys/fs/cgroup/memory/0/memory.limit_in_bytes
4194304
We can check the usage:
-# cat /cgroups/0/memory.usage_in_bytes
+# cat /sys/fs/cgroup/memory/0/memory.usage_in_bytes
1216512
A successful write to this file does not guarantee a successful set of
@@ -464,6 +466,24 @@ value for efficient access. (Of course, when necessary, it's synchronized.)
If you want to know more exact memory usage, you should use RSS+CACHE(+SWAP)
value in memory.stat(see 5.2).
+5.6 numa_stat
+
+This is similar to numa_maps but operates on a per-memcg basis. This is
+useful for providing visibility into the numa locality information within
+an memcg since the pages are allowed to be allocated from any physical
+node. One of the usecases is evaluating application performance by
+combining this information with the application's cpu allocation.
+
+We export "total", "file", "anon" and "unevictable" pages per-node for
+each memcg. The ouput format of memory.numa_stat is:
+
+total=<total pages> N0=<node 0 pages> N1=<node 1 pages> ...
+file=<total file pages> N0=<node 0 pages> N1=<node 1 pages> ...
+anon=<total anon pages> N0=<node 0 pages> N1=<node 1 pages> ...
+unevictable=<total anon pages> N0=<node 0 pages> N1=<node 1 pages> ...
+
+And we have total = file + anon + unevictable.
+
6. Hierarchy support
The memory controller supports a deep hierarchy and hierarchical accounting.
@@ -471,13 +491,13 @@ The hierarchy is created by creating the appropriate cgroups in the
cgroup filesystem. Consider for example, the following cgroup filesystem
hierarchy
- root
+ root
/ | \
- / | \
- a b c
- | \
- | \
- d e
+ / | \
+ a b c
+ | \
+ | \
+ d e
In the diagram above, with hierarchical accounting enabled, all memory
usage of e, is accounted to its ancestors up until the root (i.e, c and root),