From 4b91355e9dc9ac1eb3d69e56de093899ff2677ef Mon Sep 17 00:00:00 2001 From: KAMEZAWA Hiroyuki Date: Tue, 29 May 2012 15:06:51 -0700 Subject: memcg: fix/change behavior of shared anon at moving task This patch changes memcg's behavior at task_move(). At task_move(), the kernel scans a task's page table and move the changes for mapped pages from source cgroup to target cgroup. There has been a bug at handling shared anonymous pages for a long time. Before patch: - The spec says 'shared anonymous pages are not moved.' - The implementation was 'shared anonymoys pages may be moved'. If page_mapcount <=2, shared anonymous pages's charge were moved. After patch: - The spec says 'all anonymous pages are moved'. - The implementation is 'all anonymous pages are moved'. Considering usage of memcg, this will not affect user's experience. 'shared anonymous' pages only exists between a tree of processes which don't do exec(). Moving one of process without exec() seems not sane. For example, libcgroup will not be affected by this change. (Anyway, no one noticed the implementation for a long time...) Below is a discussion log: - current spec/implementation are complex - Now, shared file caches are moved - It adds unclear check as page_mapcount(). To do correct check, we should check swap users, etc. - No one notice this implementation behavior. So, no one get benefit from the design. - In general, once task is moved to a cgroup for running, it will not be moved.... - Finally, we have control knob as memory.move_charge_at_immigrate. Here is a patch to allow moving shared pages, completely. This makes memcg simpler and fix current broken code. Suggested-by: Hugh Dickins Signed-off-by: KAMEZAWA Hiroyuki Acked-by: Michal Hocko Cc: Johannes Weiner Cc: Naoya Horiguchi Cc: Glauber Costa Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/cgroups/memory.txt | 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-) (limited to 'Documentation') diff --git a/Documentation/cgroups/memory.txt b/Documentation/cgroups/memory.txt index e479007f1a75..6a066a270fc5 100644 --- a/Documentation/cgroups/memory.txt +++ b/Documentation/cgroups/memory.txt @@ -184,12 +184,14 @@ behind this approach is that a cgroup that aggressively uses a shared page will eventually get charged for it (once it is uncharged from the cgroup that brought it in -- this will happen on memory pressure). +But see section 8.2: when moving a task to another cgroup, its pages may +be recharged to the new cgroup, if move_charge_at_immigrate has been chosen. + Exception: If CONFIG_CGROUP_CGROUP_MEM_RES_CTLR_SWAP is not used. When you do swapoff and make swapped-out pages of shmem(tmpfs) to be backed into memory in force, charges for pages are accounted against the caller of swapoff rather than the users of shmem. - 2.4 Swap Extension (CONFIG_CGROUP_MEM_RES_CTLR_SWAP) Swap Extension allows you to record charge for swap. A swapped-in page is @@ -615,8 +617,7 @@ memory cgroup. bit | what type of charges would be moved ? -----+------------------------------------------------------------------------ 0 | A charge of an anonymous page(or swap of it) used by the target task. - | Those pages and swaps must be used only by the target task. You must - | enable Swap Extension(see 2.4) to enable move of swap charges. + | You must enable Swap Extension(see 2.4) to enable move of swap charges. -----+------------------------------------------------------------------------ 1 | A charge of file pages(normal file, tmpfs file(e.g. ipc shared memory) | and swaps of tmpfs file) mmapped by the target task. Unlike the case of @@ -629,8 +630,6 @@ memory cgroup. 8.3 TODO -- Implement madvise(2) to let users decide the vma to be moved or not to be - moved. - All of moving charge operations are done under cgroup_mutex. It's not good behavior to hold the mutex too long, so we may need some trick. -- cgit v1.2.3