From ef9fe980c6fcc1821ab955b74b242d2d6585fa75 Mon Sep 17 00:00:00 2001 From: Tejun Heo Date: Fri, 9 Nov 2012 09:12:30 -0800 Subject: cgroup_freezer: implement proper hierarchy support Up until now, cgroup_freezer didn't implement hierarchy properly. cgroups could be arranged in hierarchy but it didn't make any difference in how each cgroup_freezer behaved. They all operated separately. This patch implements proper hierarchy support. If a cgroup is frozen, all its descendants are frozen. A cgroup is thawed iff it and all its ancestors are THAWED. freezer.self_freezing shows the current freezing state for the cgroup itself. freezer.parent_freezing shows whether the cgroup is freezing because any of its ancestors is freezing. freezer_post_create() locks the parent and new cgroup and inherits the parent's state and freezer_change_state() applies new state top-down using cgroup_for_each_descendant_pre() which guarantees that no child can escape its parent's state. update_if_frozen() uses cgroup_for_each_descendant_post() to propagate frozen states bottom-up. Synchronization could be coarser and easier by using a single mutex to protect all hierarchy operations. Finer grained approach was used because it wasn't too difficult for cgroup_freezer and I think it's beneficial to have an example implementation and cgroup_freezer is rather simple and can serve a good one. As this makes cgroup_freezer properly hierarchical, freezer_subsys.broken_hierarchy marking is removed. Note that this patch changes userland visible behavior - freezing a cgroup now freezes all its descendants too. This behavior change is intended and has been warned via .broken_hierarchy. v2: Michal spotted a bug in freezer_change_state() - descendants were inheriting from the wrong ancestor. Fixed. v3: Documentation/cgroups/freezer-subsystem.txt updated. Signed-off-by: Tejun Heo Reviewed-by: Michal Hocko --- Documentation/cgroups/freezer-subsystem.txt | 63 +++++++++++++++++++---------- 1 file changed, 42 insertions(+), 21 deletions(-) (limited to 'Documentation/cgroups') diff --git a/Documentation/cgroups/freezer-subsystem.txt b/Documentation/cgroups/freezer-subsystem.txt index 7e62de1e59ff..c96a72cbb30a 100644 --- a/Documentation/cgroups/freezer-subsystem.txt +++ b/Documentation/cgroups/freezer-subsystem.txt @@ -49,13 +49,49 @@ prevent the freeze/unfreeze cycle from becoming visible to the tasks being frozen. This allows the bash example above and gdb to run as expected. -The freezer subsystem in the container filesystem defines a file named -freezer.state. Writing "FROZEN" to the state file will freeze all tasks in the -cgroup. Subsequently writing "THAWED" will unfreeze the tasks in the cgroup. -Reading will return the current state. +The cgroup freezer is hierarchical. Freezing a cgroup freezes all +tasks beloning to the cgroup and all its descendant cgroups. Each +cgroup has its own state (self-state) and the state inherited from the +parent (parent-state). Iff both states are THAWED, the cgroup is +THAWED. -Note freezer.state doesn't exist in root cgroup, which means root cgroup -is non-freezable. +The following cgroupfs files are created by cgroup freezer. + +* freezer.state: Read-write. + + When read, returns the effective state of the cgroup - "THAWED", + "FREEZING" or "FROZEN". This is the combined self and parent-states. + If any is freezing, the cgroup is freezing (FREEZING or FROZEN). + + FREEZING cgroup transitions into FROZEN state when all tasks + belonging to the cgroup and its descendants become frozen. Note that + a cgroup reverts to FREEZING from FROZEN after a new task is added + to the cgroup or one of its descendant cgroups until the new task is + frozen. + + When written, sets the self-state of the cgroup. Two values are + allowed - "FROZEN" and "THAWED". If FROZEN is written, the cgroup, + if not already freezing, enters FREEZING state along with all its + descendant cgroups. + + If THAWED is written, the self-state of the cgroup is changed to + THAWED. Note that the effective state may not change to THAWED if + the parent-state is still freezing. If a cgroup's effective state + becomes THAWED, all its descendants which are freezing because of + the cgroup also leave the freezing state. + +* freezer.self_freezing: Read only. + + Shows the self-state. 0 if the self-state is THAWED; otherwise, 1. + This value is 1 iff the last write to freezer.state was "FROZEN". + +* freezer.parent_freezing: Read only. + + Shows the parent-state. 0 if none of the cgroup's ancestors is + frozen; otherwise, 1. + +The root cgroup is non-freezable and the above interface files don't +exist. * Examples of usage : @@ -85,18 +121,3 @@ to unfreeze all tasks in the container : This is the basic mechanism which should do the right thing for user space task in a simple scenario. - -It's important to note that freezing can be incomplete. In that case we return -EBUSY. This means that some tasks in the cgroup are busy doing something that -prevents us from completely freezing the cgroup at this time. After EBUSY, -the cgroup will remain partially frozen -- reflected by freezer.state reporting -"FREEZING" when read. The state will remain "FREEZING" until one of these -things happens: - - 1) Userspace cancels the freezing operation by writing "THAWED" to - the freezer.state file - 2) Userspace retries the freezing operation by writing "FROZEN" to - the freezer.state file (writing "FREEZING" is not legal - and returns EINVAL) - 3) The tasks that blocked the cgroup from entering the "FROZEN" - state disappear from the cgroup's set of tasks. -- cgit v1.2.3 From 92fb97487a7e41b222c1417cabd1d1ab7cc3a48c Mon Sep 17 00:00:00 2001 From: Tejun Heo Date: Mon, 19 Nov 2012 08:13:38 -0800 Subject: cgroup: rename ->create/post_create/pre_destroy/destroy() to ->css_alloc/online/offline/free() Rename cgroup_subsys css lifetime related callbacks to better describe what their roles are. Also, update documentation. Signed-off-by: Tejun Heo Acked-by: Li Zefan --- Documentation/cgroups/cgroups.txt | 49 +++++++++++++++++++++++---------------- 1 file changed, 29 insertions(+), 20 deletions(-) (limited to 'Documentation/cgroups') diff --git a/Documentation/cgroups/cgroups.txt b/Documentation/cgroups/cgroups.txt index 9e04196c4d78..b06eea217403 100644 --- a/Documentation/cgroups/cgroups.txt +++ b/Documentation/cgroups/cgroups.txt @@ -553,16 +553,16 @@ call to cgroup_unload_subsys(). It should also set its_subsys.module = THIS_MODULE in its .c file. Each subsystem may export the following methods. The only mandatory -methods are create/destroy. Any others that are null are presumed to +methods are css_alloc/free. Any others that are null are presumed to be successful no-ops. -struct cgroup_subsys_state *create(struct cgroup *cgrp) +struct cgroup_subsys_state *css_alloc(struct cgroup *cgrp) (cgroup_mutex held by caller) -Called to create a subsystem state object for a cgroup. The +Called to allocate a subsystem state object for a cgroup. The subsystem should allocate its subsystem state object for the passed cgroup, returning a pointer to the new object on success or a -negative error code. On success, the subsystem pointer should point to +ERR_PTR() value. On success, the subsystem pointer should point to a structure of type cgroup_subsys_state (typically embedded in a larger subsystem-specific object), which will be initialized by the cgroup system. Note that this will be called at initialization to @@ -571,24 +571,33 @@ identified by the passed cgroup object having a NULL parent (since it's the root of the hierarchy) and may be an appropriate place for initialization code. -void destroy(struct cgroup *cgrp) +int css_online(struct cgroup *cgrp) (cgroup_mutex held by caller) -The cgroup system is about to destroy the passed cgroup; the subsystem -should do any necessary cleanup and free its subsystem state -object. By the time this method is called, the cgroup has already been -unlinked from the file system and from the child list of its parent; -cgroup->parent is still valid. (Note - can also be called for a -newly-created cgroup if an error occurs after this subsystem's -create() method has been called for the new cgroup). - -int pre_destroy(struct cgroup *cgrp); - -Called before checking the reference count on each subsystem. This may -be useful for subsystems which have some extra references even if -there are not tasks in the cgroup. If pre_destroy() returns error code, -rmdir() will fail with it. From this behavior, pre_destroy() can be -called multiple times against a cgroup. +Called after @cgrp successfully completed all allocations and made +visible to cgroup_for_each_child/descendant_*() iterators. The +subsystem may choose to fail creation by returning -errno. This +callback can be used to implement reliable state sharing and +propagation along the hierarchy. See the comment on +cgroup_for_each_descendant_pre() for details. + +void css_offline(struct cgroup *cgrp); + +This is the counterpart of css_online() and called iff css_online() +has succeeded on @cgrp. This signifies the beginning of the end of +@cgrp. @cgrp is being removed and the subsystem should start dropping +all references it's holding on @cgrp. When all references are dropped, +cgroup removal will proceed to the next step - css_free(). After this +callback, @cgrp should be considered dead to the subsystem. + +void css_free(struct cgroup *cgrp) +(cgroup_mutex held by caller) + +The cgroup system is about to free @cgrp; the subsystem should free +its subsystem state object. By the time this method is called, @cgrp +is completely unused; @cgrp->parent is still valid. (Note - can also +be called for a newly-created cgroup if an error occurs after this +subsystem's create() method has been called for the new cgroup). int can_attach(struct cgroup *cgrp, struct cgroup_taskset *tset) (cgroup_mutex held by caller) -- cgit v1.2.3 From 2260e7fc1f18ad815324605c1ce7d5c6fd9b19a2 Mon Sep 17 00:00:00 2001 From: Tejun Heo Date: Mon, 19 Nov 2012 08:13:38 -0800 Subject: cgroup: s/CGRP_CLONE_CHILDREN/CGRP_CPUSET_CLONE_CHILDREN/ clone_children is only meaningful for cpuset and will stay that way. Rename the flag to reflect that and update documentation. Also, drop clone_children() wrapper in cgroup.c. The thin wrapper is used only a few times and one of them will go away soon. Signed-off-by: Tejun Heo Acked-by: Serge E. Hallyn Acked-by: Li Zefan Cc: Glauber Costa --- Documentation/cgroups/cgroups.txt | 8 +++----- 1 file changed, 3 insertions(+), 5 deletions(-) (limited to 'Documentation/cgroups') diff --git a/Documentation/cgroups/cgroups.txt b/Documentation/cgroups/cgroups.txt index b06eea217403..24cdf76bd20c 100644 --- a/Documentation/cgroups/cgroups.txt +++ b/Documentation/cgroups/cgroups.txt @@ -299,11 +299,9 @@ a cgroup hierarchy's release_agent path is empty. 1.5 What does clone_children do ? --------------------------------- -If the clone_children flag is enabled (1) in a cgroup, then all -cgroups created beneath will call the post_clone callbacks for each -subsystem of the newly created cgroup. Usually when this callback is -implemented for a subsystem, it copies the values of the parent -subsystem, this is the case for the cpuset. +This flag only affects the cpuset controller. If the clone_children +flag is enabled (1) in a cgroup, a new cpuset cgroup will copy its +configuration from the parent during initialization. 1.6 How do I use cgroups ? -------------------------- -- cgit v1.2.3 From 033fa1c5f5f73833598a0beb022c0048fb769dad Mon Sep 17 00:00:00 2001 From: Tejun Heo Date: Mon, 19 Nov 2012 08:13:39 -0800 Subject: cgroup, cpuset: remove cgroup_subsys->post_clone() Currently CGRP_CPUSET_CLONE_CHILDREN triggers ->post_clone(). Now that clone_children is cpuset specific, there's no reason to have this rather odd option activation mechanism in cgroup core. cpuset can check the flag from its ->css_allocate() and take the necessary action. Move cpuset_post_clone() logic to the end of cpuset_css_alloc() and remove cgroup_subsys->post_clone(). Loosely based on Glauber's "generalize post_clone into post_create" patch. Signed-off-by: Tejun Heo Original-patch-by: Glauber Costa Original-patch: <1351686554-22592-2-git-send-email-glommer@parallels.com> Acked-by: Serge E. Hallyn Acked-by: Li Zefan Cc: Glauber Costa --- Documentation/cgroups/cgroups.txt | 8 -------- 1 file changed, 8 deletions(-) (limited to 'Documentation/cgroups') diff --git a/Documentation/cgroups/cgroups.txt b/Documentation/cgroups/cgroups.txt index 24cdf76bd20c..bcf1a00b06a1 100644 --- a/Documentation/cgroups/cgroups.txt +++ b/Documentation/cgroups/cgroups.txt @@ -642,14 +642,6 @@ void exit(struct task_struct *task) Called during task exit. -void post_clone(struct cgroup *cgrp) -(cgroup_mutex held by caller) - -Called during cgroup_create() to do any parameter -initialization which might be required before a task could attach. For -example, in cpusets, no task may attach before 'cpus' and 'mems' are set -up. - void bind(struct cgroup *root) (cgroup_mutex held by caller) -- cgit v1.2.3 From 811d8d6ff59cbc7d618dfa2cd339ba6c3691a7eb Mon Sep 17 00:00:00 2001 From: Tejun Heo Date: Thu, 22 Nov 2012 07:32:47 -0800 Subject: netprio_cgroup: allow nesting and inherit config on cgroup creation Inherit netprio configuration from ->css_online(), allow nesting and remove .broken_hierarchy marking. This makes netprio_cgroup's behavior match netcls_cgroup's. Note that this patch changes userland-visible behavior. Nesting is allowed and the first level cgroups below the root cgroup behave differently - they inherit priorities from the root cgroup on creation instead of starting with 0. This is unfortunate but not doing so is much crazier. Signed-off-by: Tejun Heo Tested-and-Acked-by: Daniel Wagner Acked-by: David S. Miller --- Documentation/cgroups/net_prio.txt | 2 ++ 1 file changed, 2 insertions(+) (limited to 'Documentation/cgroups') diff --git a/Documentation/cgroups/net_prio.txt b/Documentation/cgroups/net_prio.txt index 01b322635591..a82cbd28ea8a 100644 --- a/Documentation/cgroups/net_prio.txt +++ b/Documentation/cgroups/net_prio.txt @@ -51,3 +51,5 @@ One usage for the net_prio cgroup is with mqprio qdisc allowing application traffic to be steered to hardware/driver based traffic classes. These mappings can then be managed by administrators or other networking protocols such as DCBX. + +A new net_prio cgroup inherits the parent's configuration. -- cgit v1.2.3 From 15ef4ffaa797034d5ff82844daf8f595d7c6d53c Mon Sep 17 00:00:00 2001 From: Namjae Jeon Date: Sat, 8 Dec 2012 15:02:36 +0900 Subject: cgroup: update Documentation/cgroups/00-INDEX There are new files added to cgroup documentation. Let's update the index file to include the new files. Signed-off-by: Namjae Jeon Signed-off-by: Amit Sahrawat Signed-off-by: Tejun Heo --- Documentation/cgroups/00-INDEX | 8 ++++++++ 1 file changed, 8 insertions(+) (limited to 'Documentation/cgroups') diff --git a/Documentation/cgroups/00-INDEX b/Documentation/cgroups/00-INDEX index 3f58fa3d6d00..f78b90a35ad0 100644 --- a/Documentation/cgroups/00-INDEX +++ b/Documentation/cgroups/00-INDEX @@ -1,7 +1,11 @@ 00-INDEX - this file +blkio-controller.txt + - Description for Block IO Controller, implementation and usage details. cgroups.txt - Control Groups definition, implementation details, examples and API. +cgroup_event_listener.c + - A user program for cgroup listener. cpuacct.txt - CPU Accounting Controller; account CPU usage for groups of tasks. cpusets.txt @@ -10,9 +14,13 @@ devices.txt - Device Whitelist Controller; description, interface and security. freezer-subsystem.txt - checkpointing; rationale to not use signals, interface. +hugetlb.txt + - HugeTLB Controller implementation and usage details. memcg_test.txt - Memory Resource Controller; implementation details. memory.txt - Memory Resource Controller; design, accounting, interface, testing. +net_prio.txt + - Network priority cgroups details and usages. resource_counter.txt - Resource Counter API. -- cgit v1.2.3