summaryrefslogtreecommitdiff
path: root/mm/percpu.c
AgeCommit message (Collapse)Author
2009-07-04percpu: simplify pcpu_setup_first_chunk()Tejun Heo
Now that all first chunk allocator helpers allocate and map the first chunk themselves, there's no need to have optional default alloc/map in pcpu_setup_first_chunk(). Drop @populate_pte_fn and only leave @dyn_size optional and make all other params mandatory. This makes it much easier to follow what pcpu_setup_first_chunk() is doing and what actual differences tweaking each parameter results in. [ Impact: drop unused code path ] Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@elte.hu>
2009-07-04x86,percpu: generalize lpage first chunk allocatorTejun Heo
Generalize and move x86 setup_pcpu_lpage() into pcpu_lpage_first_chunk(). setup_pcpu_lpage() now is a simple wrapper around the generalized version. Other than taking size parameters and using arch supplied callbacks to allocate/free/map memory, pcpu_lpage_first_chunk() is identical to the original implementation. This simplifies arch code and will help converting more archs to dynamic percpu allocator. While at it, factor out pcpu_calc_fc_sizes() which is common to pcpu_embed_first_chunk() and pcpu_lpage_first_chunk(). [ Impact: code reorganization and generalization ] Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@elte.hu>
2009-07-04percpu: make 4k first chunk allocator map memoryTejun Heo
At first, percpu first chunk was always setup page-by-page by the generic code. To add other allocators, different parts of the generic initialization was made optional. Now we have three allocators - embed, remap and 4k. embed and remap fully handle allocation and mapping of the first chunk while 4k still depends on generic code for those. This makes the generic alloc/map paths specifci to 4k and makes the code unnecessary complicated with optional generic behaviors. This patch makes the 4k allocator to allocate and map memory directly instead of depending on the generic code. The only outside visible change is that now dynamic area in the first chunk is allocated up-front instead of on-demand. This doesn't make any meaningful difference as the area is minimal (usually less than a page, just enough to fill the alignment) on 4k allocator. Plus, dynamic area in the first chunk usually gets fully used anyway. This will allow simplification of pcpu_setpu_first_chunk() and removal of chunk->page array. [ Impact: no outside visible change other than up-front allocation of dyn area ] Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@elte.hu>
2009-07-04x86,percpu: generalize 4k first chunk allocatorTejun Heo
Generalize and move x86 setup_pcpu_4k() into pcpu_4k_first_chunk(). setup_pcpu_4k() now is a simple wrapper around the generalized version. Other than taking size parameters and using arch supplied callbacks to allocate/free memory, pcpu_4k_first_chunk() is identical to the original implementation. This simplifies arch code and will help converting more archs to dynamic percpu allocator. While at it, s/pcpu_populate_pte_fn_t/pcpu_fc_populate_pte_fn_t/ for consistency. [ Impact: code reorganization and generalization ] Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@elte.hu>
2009-07-04percpu: drop @unit_size from embed first chunk allocatorTejun Heo
The only extra feature @unit_size provides is making dead space at the end of the first chunk which doesn't have any valid usecase. Drop the parameter. This will increase consistency with generalized 4k allocator. James Bottomley spotted missing conversion for the default setup_per_cpu_areas() which caused build breakage on all arcsh which use it. [ Impact: drop unused code path ] Signed-off-by: Tejun Heo <tj@kernel.org> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Cc: Ingo Molnar <mingo@elte.hu>
2009-07-04x86: make pcpu_chunk_addr_search() matching stricterTejun Heo
The @addr passed into pcpu_chunk_addr_search() is unit0 based address and thus should be matched inside unit0 area. Currently, when it uses chunk size when determining whether the address falls in the first chunk. Addresses in unitN where N>0 shouldn't be passed in anyway, so this doesn't cause any malfunction but fix it for consistency. [ Impact: mostly cleanup ] Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@elte.hu>
2009-06-24percpu: use dynamic percpu allocator as the default percpu allocatorTejun Heo
This patch makes most !CONFIG_HAVE_SETUP_PER_CPU_AREA archs use dynamic percpu allocator. The first chunk is allocated using embedding helper and 8k is reserved for modules. This ensures that the new allocator behaves almost identically to the original allocator as long as static percpu variables are concerned, so it shouldn't introduce much breakage. s390 and alpha use custom SHIFT_PERCPU_PTR() to work around addressing range limit the addressing model imposes. Unfortunately, this breaks if the address is specified using a variable, so for now, the two archs aren't converted. The following architectures are affected by this change. * sh * arm * cris * mips * sparc(32) * blackfin * avr32 * parisc (broken, under investigation) * m32r * powerpc(32) As this change makes the dynamic allocator the default one, CONFIG_HAVE_DYNAMIC_PER_CPU_AREA is replaced with its invert - CONFIG_HAVE_LEGACY_PER_CPU_AREA, which is added to yet-to-be converted archs. These archs implement their own setup_per_cpu_areas() and the conversion is not trivial. * powerpc(64) * sparc(64) * ia64 * alpha * s390 Boot and batch alloc/free tests on x86_32 with debug code (x86_32 doesn't use default first chunk initialization). Compile tested on sparc(32), powerpc(32), arm and alpha. Kyle McMartin reported that this change breaks parisc. The problem is still under investigation and he is okay with pushing this patch forward and fixing parisc later. [ Impact: use dynamic allocator for most archs w/o custom percpu setup ] Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Rusty Russell <rusty@rustcorp.com.au> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Reviewed-by: Christoph Lameter <cl@linux.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Mikael Starvik <starvik@axis.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Bryan Wu <cooloney@kernel.org> Cc: Kyle McMartin <kyle@mcmartin.ca> Cc: Matthew Wilcox <matthew@wil.cx> Cc: Grant Grundler <grundler@parisc-linux.org> Cc: Hirokazu Takata <takata@linux-m32r.org> Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Ingo Molnar <mingo@elte.hu>
2009-06-22x86: implement percpu_alloc kernel parameterTejun Heo
According to Andi, it isn't clear whether lpage allocator is worth the trouble as there are many processors where PMD TLB is far scarcer than PTE TLB. The advantage or disadvantage probably depends on the actual size of percpu area and specific processor. As performance degradation due to TLB pressure tends to be highly workload specific and subtle, it is difficult to decide which way to go without more data. This patch implements percpu_alloc kernel parameter to allow selecting which first chunk allocator to use to ease debugging and testing. While at it, make sure all the failure paths report why something failed to help determining why certain allocator isn't working. Also, kill the "Great future plan" comment which had already been realized quite some time ago. [ Impact: allow explicit percpu first chunk allocator selection ] Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Jan Beulich <JBeulich@novell.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Ingo Molnar <mingo@elte.hu>
2009-06-22percpu: fix too lazy vunmap cache flushingTejun Heo
In pcpu_unmap(), flushing virtual cache on vunmap can't be delayed as the page is going to be returned to the page allocator. Only TLB flushing can be put off such that vmalloc code can handle it lazily. Fix it. [ Impact: fix subtle virtual cache flush bug ] Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Nick Piggin <npiggin@suse.de> Cc: Ingo Molnar <mingo@elte.hu>
2009-04-08percpu: remove rbtree and use page->index insteadChristoph Lameter
Impact: use page->index for addr to chunk mapping instead of dedicated rbtree The rbtree is used to determine the chunk from the virtual address. However, we can already determine the page struct from a virtual address and there are several unused fields in page struct used by vmalloc. Use the index field to store a pointer to the chunk. Then there is no need anymore for an rbtree. tj: * s/(set|get)_chunk/pcpu_\1_page_chunk/ * Drop inline from the above two functions and moved them upwards so that they are with other simple helpers. * Initial pages might not (actually most of the time don't) live in the vmalloc area. With the previous patch to manually reverse-map both first chunks, this is no longer an issue. Removed pcpu_set_chunk() call on initial pages. Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: rusty@rustcorp.com.au Cc: Paul Mundt <lethal@linux-sh.org> Cc: rmk@arm.linux.org.uk Cc: starvik@axis.com Cc: ralf@linux-mips.org Cc: davem@davemloft.net Cc: cooloney@kernel.org Cc: kyle@mcmartin.ca Cc: matthew@wil.cx Cc: grundler@parisc-linux.org Cc: takata@linux-m32r.org Cc: benh@kernel.crashing.org Cc: rth@twiddle.net Cc: ink@jurassic.park.msu.ru Cc: heiko.carstens@de.ibm.com Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Nick Piggin <npiggin@suse.de> LKML-Reference: <49D43D58.4050102@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-08percpu: don't put the first chunk in reverse-map rbtreeTejun Heo
Impact: both first chunks don't use rbtree, no functional change There can be two first chunks - reserved and dynamic with the former one being optional. Dynamic first chunk was linked on reverse-mapping rbtree while the reserved one was mapped manually using the start address and reserved offset limit. This patch makes both first chunks to be looked up manually without using the rbtree. This is to help getting rid of the rbtree. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: rusty@rustcorp.com.au Cc: Paul Mundt <lethal@linux-sh.org> Cc: rmk@arm.linux.org.uk Cc: starvik@axis.com Cc: ralf@linux-mips.org Cc: davem@davemloft.net Cc: cooloney@kernel.org Cc: kyle@mcmartin.ca Cc: matthew@wil.cx Cc: grundler@parisc-linux.org Cc: takata@linux-m32r.org Cc: benh@kernel.crashing.org Cc: rth@twiddle.net Cc: ink@jurassic.park.msu.ru Cc: heiko.carstens@de.ibm.com Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Nick Piggin <npiggin@suse.de> Cc: Christoph Lameter <cl@linux.com> LKML-Reference: <49D43CEA.3040609@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-10percpu: generalize embedding first chunk setup helperTejun Heo
Impact: code reorganization Separate out embedding first chunk setup helper from x86 embedding first chunk allocator and put it in mm/percpu.c. This will be used by the default percpu first chunk allocator and possibly by other archs. Signed-off-by: Tejun Heo <tj@kernel.org>
2009-03-10percpu: more flexibility for @dyn_size of pcpu_setup_first_chunk()Tejun Heo
Impact: cleanup, more flexibility for first chunk init Non-negative @dyn_size used to be allowed iff @unit_size wasn't auto. This restriction stemmed from implementation detail and made things a bit less intuitive. This patch allows @dyn_size to be specified regardless of @unit_size and swaps the positions of @dyn_size and @unit_size so that the parameter order makes more sense (static, reserved and dyn sizes followed by enclosing unit_size). While at it, add @unit_size >= PCPU_MIN_UNIT_SIZE sanity check. Signed-off-by: Tejun Heo <tj@kernel.org>
2009-03-10percpu: make x86 addr <-> pcpu ptr conversion macros genericTejun Heo
Impact: generic addr <-> pcpu ptr conversion macros There's nothing arch specific about x86 __addr_to_pcpu_ptr() and __pcpu_ptr_to_addr(). With proper __per_cpu_load and __per_cpu_start defined, they'll do the right thing regardless of actual layout. Move these macros from arch/x86/include/asm/percpu.h to mm/percpu.c and allow archs to override it as necessary. Signed-off-by: Tejun Heo <tj@kernel.org>
2009-03-07percpu: finer grained locking to break deadlock and allow atomic freeTejun Heo
Impact: fix deadlock and allow atomic free Percpu allocation always uses GFP_KERNEL and whole alloc/free paths were protected by single mutex. All percpu allocations have been from GFP_KERNEL-safe context and the original allocator had this assumption too. However, by protecting both alloc and free paths with the same mutex, the new allocator creates free -> alloc -> GFP_KERNEL dependency which the original allocator didn't have. This can lead to deadlock if free is called from FS or IO paths. Also, in general, allocators are expected to allow free to be called from atomic context. This patch implements finer grained locking to break the deadlock and allow atomic free. For details, please read the "Synchronization rules" comment. While at it, also add CONTEXT: to function comments to describe which context they expect to be called from and what they do to it. This problem was reported by Thomas Gleixner and Peter Zijlstra. http://thread.gmane.org/gmane.linux.kernel/802384 Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Thomas Gleixner <tglx@linutronix.de> Reported-by: Peter Zijlstra <peterz@infradead.org>
2009-03-07percpu: move fully free chunk reclamation into a workTejun Heo
Impact: code reorganization for later changes Do fully free chunk reclamation using a work. This change is to prepare for locking changes. Signed-off-by: Tejun Heo <tj@kernel.org>
2009-03-07percpu: move chunk area map extension out of area allocationTejun Heo
Impact: code reorganization for later changes Separate out chunk area map extension into a separate function - pcpu_extend_area_map() - and call it directly from pcpu_alloc() such that pcpu_alloc_area() is guaranteed to have enough area map slots on invocation. With this change, pcpu_alloc_area() does only area allocation and the only failure mode is when the chunk doens't have enough room, so there's no need to distinguish it from memory allocation failures. Make it return -1 on such cases instead of hacky -ENOSPC. Signed-off-by: Tejun Heo <tj@kernel.org>
2009-03-07percpu: replace pcpu_realloc() with pcpu_mem_alloc() and pcpu_mem_free()Tejun Heo
Impact: code reorganization for later changes With static map handling moved to pcpu_split_block(), pcpu_realloc() only clutters the code and it's also unsuitable for scheduled locking changes. Implement and use pcpu_mem_alloc/free() instead. Signed-off-by: Tejun Heo <tj@kernel.org>
2009-03-06percpu, module: implement reserved allocation and use it for module percpu ↵Tejun Heo
variables Impact: add reserved allocation functionality and use it for module percpu variables This patch implements reserved allocation from the first chunk. When setting up the first chunk, arch can ask to set aside certain number of bytes right after the core static area which is available only through a separate reserved allocator. This will be used primarily for module static percpu variables on architectures with limited relocation range to ensure that the module perpcu symbols are inside the relocatable range. If reserved area is requested, the first chunk becomes reserved and isn't available for regular allocation. If the first chunk also includes piggy-back dynamic allocation area, a separate chunk mapping the same region is created to serve dynamic allocation. The first one is called static first chunk and the second dynamic first chunk. Although they share the page map, their different area map initializations guarantee they serve disjoint areas according to their purposes. If arch doesn't setup reserved area, reserved allocation is handled like any other allocation. Signed-off-by: Tejun Heo <tj@kernel.org>
2009-03-06percpu: add an indirection ptr for chunk page map accessTejun Heo
Impact: allow sharing page map, no functional difference yet Make chunk->page access indirect by adding a pointer and renaming the actual array to page_ar. This will be used by future changes. Signed-off-by: Tejun Heo <tj@kernel.org>
2009-03-06percpu: use negative for auto for pcpu_setup_first_chunk() argumentsTejun Heo
Impact: argument semantic cleanup In pcpu_setup_first_chunk(), zero @unit_size and @dyn_size meant auto-sizing. It's okay for @unit_size as 0 doesn't make sense but 0 dynamic reserve size is valid. Alos, if arch @dyn_size is calculated from other parameters, it might end up passing in 0 @dyn_size and malfunction when the size is automatically adjusted. This patch makes both @unit_size and @dyn_size ssize_t and use -1 for auto sizing. Signed-off-by: Tejun Heo <tj@kernel.org>
2009-03-06percpu: improve first chunk initial area map handlingTejun Heo
Impact: no functional change When the first chunk is created, its initial area map is not allocated because kmalloc isn't online yet. The map is allocated and initialized on the first allocation request on the chunk. This works fine but the scattering of initialization logic between the init function and allocation path is a bit confusing. This patch makes the first chunk initialize and use minimal statically allocated map from pcpu_setpu_first_chunk(). The map resizing path still needs to handle this specially but it's more straight-forward and gives more latitude to the init path. This will ease future changes. Signed-off-by: Tejun Heo <tj@kernel.org>
2009-03-06percpu: cosmetic renames in pcpu_setup_first_chunk()Tejun Heo
Impact: cosmetic, preparation for future changes Make the following renames in pcpur_setup_first_chunk() in preparation for future changes. * s/free_size/dyn_size/ * s/static_vm/first_vm/ * s/static_chunk/schunk/ Signed-off-by: Tejun Heo <tj@kernel.org>
2009-03-01percpu: kill compile warning in pcpu_populate_chunk()Tejun Heo
Impact: remove compile warning Mark local variable map_end in pcpu_populate_chunk() with uninitialized_var(). The variable is always used in tandem with map_start and guaranteed to be initialized before use but gcc doesn't understand that. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Ingo Molnar <mingo@elte.hu>
2009-02-24percpu: add __read_mostly to variables which are mostly read onlyTejun Heo
Most global variables in percpu allocator are initialized during boot and read only from that point on. Add __read_mostly as per Rusty's suggestion. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Rusty Russell <rusty@rustcorp.com.au>
2009-02-24percpu: give more latitude to arch specific first chunk initializationTejun Heo
Impact: more latitude for first percpu chunk allocation The first percpu chunk serves the kernel static percpu area and may or may not contain extra room for further dynamic allocation. Initialization of the first chunk needs to be done before normal memory allocation service is up, so it has its own init path - pcpu_setup_static(). It seems archs need more latitude while initializing the first chunk for example to take advantage of large page mapping. This patch makes the following changes to allow this. * Define PERCPU_DYNAMIC_RESERVE to give arch hint about how much space to reserve in the first chunk for further dynamic allocation. * Rename pcpu_setup_static() to pcpu_setup_first_chunk(). * Make pcpu_setup_first_chunk() much more flexible by fetching page pointer by callback and adding optional @unit_size, @free_size and @base_addr arguments which allow archs to selectively part of chunk initialization to their likings. Signed-off-by: Tejun Heo <tj@kernel.org>
2009-02-24percpu: remove unit_size power-of-2 restrictionTejun Heo
Impact: allow unit_size to be arbitrary multiple of PAGE_SIZE In dynamic percpu allocator, there is no reason the unit size should be power of two. Remove the restriction. As non-power-of-two unit size means that empty chunks fall into the same slot index as lightly occupied chunks which is bad for reclaming. Reserve an extra slot for empty chunks. Signed-off-by: Tejun Heo <tj@kernel.org>
2009-02-24vmalloc: add @align to vm_area_register_early()Tejun Heo
Impact: allow larger alignment for early vmalloc area allocation Some early vmalloc users might want larger alignment, for example, for custom large page mapping. Add @align to vm_area_register_early(). While at it, drop docbook comment on non-existent @size. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
2009-02-24percpu: fix pcpu_chunk_struct_sizeTejun Heo
Impact: fix short allocation leading to memory corruption While dropping rvalue wrapping macros around global parameters, pcpu_chunk_struct_size was set incorrectly resulting in shorter page pointer array. Fix it. Signed-off-by: Tejun Heo <tj@kernel.org>
2009-02-21percpu: clean up size usageTejun Heo
Andrew was concerned about the unit of variables named or have suffix size. Every usage in percpu allocator is in bytes but make it super clear by adding comments. While at it, make pcpu_depopulate_chunk() take int @off and @size like everyone else. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org>
2009-02-20percpu: implement new dynamic percpu allocatorTejun Heo
Impact: new scalable dynamic percpu allocator which allows dynamic percpu areas to be accessed the same way as static ones Implement scalable dynamic percpu allocator which can be used for both static and dynamic percpu areas. This will allow static and dynamic areas to share faster direct access methods. This feature is optional and enabled only when CONFIG_HAVE_DYNAMIC_PER_CPU_AREA is defined by arch. Please read comment on top of mm/percpu.c for details. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org>