summaryrefslogtreecommitdiff
path: root/arch/x86/mm/init_64.c
AgeCommit message (Collapse)Author
2008-04-28hotplug-memory: make online_page() commonJeremy Fitzhardinge
All architectures use an effectively identical definition of online_page(), so just make it common code. x86-64, ia64, powerpc and sh are actually identical; x86-32 is slightly different. x86-32's differences arise because it puts its hotplug pages in the highmem zone. We can handle this in the generic code by inspecting the page to see if its in highmem, and update the totalhigh_pages count appropriately. This leaves init_32.c:free_new_highpage with a single caller, so I folded it into add_one_highpage_init. I also removed an incorrect comment referring to the NUMA case; any NUMA details have already been dealt with by the time online_page() is called. [akpm@linux-foundation.org: fix indenting] Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Acked-by: Dave Hansen <dave@linux.vnet.ibm.com> Reviewed-by: KAMEZAWA Hiroyuki <kamez.hiroyu@jp.fujitsu.com> Tested-by: KAMEZAWA Hiroyuki <kamez.hiroyu@jp.fujitsu.com> Cc: Yasunori Goto <y-goto@jp.fujitsu.com> Cc: Christoph Lameter <clameter@sgi.com> Acked-by: Ingo Molnar <mingo@elte.hu> Acked-by: Yasunori Goto <y-goto@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-26x86_64/mm: check and print vmemmap allocation continuousYinghai Lu
On big systems with lots of memory, don't print out too much during bootup, and make it easy to find if it is continuous. on 256G 8 sockets system will get [ffffe20000000000-ffffe20002bfffff] PMD -> [ffff810001400000-ffff810003ffffff] on node 0 [ffffe2001c700000-ffffe2001c7fffff] potential offnode page_structs [ffffe20002c00000-ffffe2001c7fffff] PMD -> [ffff81000c000000-ffff8100255fffff] on node 0 [ffffe20038700000-ffffe200387fffff] potential offnode page_structs [ffffe2001c800000-ffffe200387fffff] PMD -> [ffff810820200000-ffff81083c1fffff] on node 1 [ffffe20040000000-ffffe2007fffffff] PUD ->ffff811027a00000 on node 2 [ffffe20038800000-ffffe2003fffffff] PMD -> [ffff811020200000-ffff8110279fffff] on node 2 [ffffe20054700000-ffffe200547fffff] potential offnode page_structs [ffffe20040000000-ffffe200547fffff] PMD -> [ffff811027c00000-ffff81103c3fffff] on node 2 [ffffe20070700000-ffffe200707fffff] potential offnode page_structs [ffffe20054800000-ffffe200707fffff] PMD -> [ffff811820200000-ffff81183c1fffff] on node 3 [ffffe20080000000-ffffe200bfffffff] PUD ->ffff81202fa00000 on node 4 [ffffe20070800000-ffffe2007fffffff] PMD -> [ffff812020200000-ffff81202f9fffff] on node 4 [ffffe2008c700000-ffffe2008c7fffff] potential offnode page_structs [ffffe20080000000-ffffe2008c7fffff] PMD -> [ffff81202fc00000-ffff81203c3fffff] on node 4 [ffffe200a8700000-ffffe200a87fffff] potential offnode page_structs [ffffe2008c800000-ffffe200a87fffff] PMD -> [ffff812820200000-ffff81283c1fffff] on node 5 [ffffe200c0000000-ffffe200ffffffff] PUD ->ffff813037a00000 on node 6 [ffffe200a8800000-ffffe200bfffffff] PMD -> [ffff813020200000-ffff8130379fffff] on node 6 [ffffe200c4700000-ffffe200c47fffff] potential offnode page_structs [ffffe200c0000000-ffffe200c47fffff] PMD -> [ffff813037c00000-ffff81303c3fffff] on node 6 [ffffe200c4800000-ffffe200e07fffff] PMD -> [ffff813820200000-ffff81383c1fffff] on node 7 instead of a very long print out... Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-26x86_64: make reserve_bootmem_generic() use new reserve_bootmem()Yinghai Lu
"mm: make reserve_bootmem can crossed the nodes" provides new reserve_bootmem(), let reserve_bootmem_generic() use that. reserve_bootmem_generic() is used to reserve initramdisk, so this way we can make sure even when bootloader or kexec load ranges cross the node memory boundaries, reserve_bootmem still works. Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-25Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86-pat * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86-pat: generic: add ioremap_wc() interface wrapper /dev/mem: make promisc the default pat: cleanups x86: PAT use reserve free memtype in mmap of /dev/mem x86: PAT phys_mem_access_prot_allowed for dev/mem mmap x86: PAT avoid aliasing in /dev/mem read/write devmem: add range_is_allowed() check to mmap of /dev/mem x86: introduce /dev/mem restrictions with a config option
2008-04-25x86: remove set_fixmap() warningIngo Molnar
set_fixmap()+clear_fixmap() is safe. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-25x86: make __set_fixmap() non-initIngo Molnar
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-24x86: introduce /dev/mem restrictions with a config optionArjan van de Ven
This patch introduces a restriction on /dev/mem: Only non-memory can be read or written unless the newly introduced config option is set. The X server needs access to /dev/mem for the PCI space, but it doesn't need access to memory; both the file permissions and SELinux permissions of /dev/mem just make X effectively super-super powerful. With the exception of the BIOS area, there's just no valid app that uses /dev/mem on actual memory. Other popular users of /dev/mem are rootkits and the like. (note: mmap access of memory via /dev/mem was already not allowed since a really long time) People who want to use /dev/mem for kernel debugging can enable the config option. The restrictions of this patch have been in the Fedora and RHEL kernels for at least 4 years without any problems. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-19x86: move definition to pci-dma.cGlauber Costa
Move dma_ops structure definition to pci-dma.c, where it belongs. Signed-off-by: Glauber Costa <gcosta@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-17x86: account overlapped mappings in max_pfn_mappedAndi Kleen
When end_pfn is not aligned to 2MB (or 1GB) then the kernel might map more memory than end_pfn. Account this in max_pfn_mapped. Signed-off-by: Andi Kleen <ak@suse.de> Cc: andreas.herrmann3@amd.com Cc: mingo@elte.hu Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-17x86: replace the now useless max_pfn_mapped defineThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-17x86: fix memtest print outYinghai Lu
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-17x86: memtest bootparamYinghai Lu
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-17x86: early memtest to find bad ramYinghai Lu
do simple memtest after init_memory_mapping use find_e820_area_size to find all ram range that is not reserved. and do some simple bits test to find some bad ram. if find some bad ram, use reserve_early to exclude that range. Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-17x86: Remove redundant display of free swap space in show_mem()Johannes Weiner
Signed-off-by: Johannes Weiner <hannes@saeurebad.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-17x86: enhance DEBUG_RODATA support for hotplug and kprobesMathieu Desnoyers
Standardize DEBUG_RODATA, removing special cases for hotplug and kprobes. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Cc: Andi Kleen <andi@firstfloor.org> Cc: pageexec@freemail.hu Cc: akpm@linux-foundation.org CC: Andi Kleen <andi@firstfloor.org> CC: pageexec@freemail.hu CC: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-17x86: do kernel direct mapping at boot using GB pagesAndi Kleen
The AMD Fam10h CPUs support new Gigabyte page table entry for mapping 1GB at a time. Use this for the kernel direct mapping. Only done for 64bit because i386 does not support GB page tables. This only applies to the data portion of the direct mapping; the kernel text mapping stays with 2MB pages because the AMD Fam10h microarchitecture does not support GB ITLBs and AMD recommends against using GB mappings for code. Can be disabled with disable_gbpages on the kernel command line [ tglx@linutronix.de: simplify enable code ] [ Yinghai Lu <yinghai.lu@sun.com>: boot fix on 256 GB RAM ] Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-17x86: add gbpages switchesIngo Molnar
These new controls toggle experimental support for a new CPU feature, the straightforward extension of largepages from the pmd level to the pud level, which allows 1GB (kernel) TLBs instead of 2MB TLBs. Turn it off by default, as this code has not been tested well enough yet. Use the CONFIG_DIRECT_GBPAGES=y .config option or gbpages on the boot line can be used to enable it. If enabled in the .config then nogbpages boot option disables it. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-02-26x86: fix spontaneous reboot with allyesconfig bzImageIngo Molnar
recently the 64-bit allyesconfig bzImage kernel started spontaneously rebooting during early bootup. after a few fun hours spent with early init debugging, it turns out that we've got this rather annoying limit on the size of the kernel image: #define KERNEL_TEXT_SIZE (40*1024*1024) which limit my vmlinux just happened to pass: text data bss dec hex filename 29703744 4222751 8646224 42572719 2899baf vmlinux 40 MB is 42572719 bytes, so my vmlinux was just 1.5% above this limit :-/ So it happily crashed right in head_64.S, which - as we all know - is the most debuggable code in the whole architecture ;-) So increase the limit to allow an up to 128MB kernel image to be mapped. (should anyone be that crazy or lazy) We have a full 4K of pagetable (level2_kernel_pgt) allocated for these mappings already, so there's no RAM overhead and the limit was rather pointless and arbitrary. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-02-26x86: remove double-checking empty zero pages debugYinghai Lu
so far no one complained about that. Signed-off-by: Yinghai Lu <yinghai.lu@sun.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-02-18x86: zap invalid and unused pmds in early bootThomas Gleixner
The early boot code maps KERNEL_TEXT_SIZE (currently 40MB) starting from __START_KERNEL_map. The kernel itself only needs _text to _end mapped in the high alias. On relocatible kernels the ASM setup code adjusts the compile time created high mappings to the relocation. This creates invalid pmd entries for negative offsets: 0xffffffff80000000 -> pmd entry: ffffffffff2001e3 It points outside of the physical address space and is marked present. This starts at the virtual address __START_KERNEL_map and goes up to the point where the first valid physical address (0x0) is mapped. Zap the mappings before _text and after _end right away in early boot. This removes also the invalid entries. Furthermore it simplifies the range check for high aliases. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-02-14x86: include proper prototypes for rodata_testHarvey Harrison
extern should not appear in C files. Also, the definitions do not match the prototype currently, not sure what way you want to go with this, I've switched the prototype to return int, but I can see going to the void return as well. Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-02-09x86: introduce page pool in cpaThomas Gleixner
DEBUG_PAGEALLOC was not possible on 64-bit due to its early-bootup hardcoded reliance on PSE pages, and the unrobustness of the runtime splitup of large pages. The splitup ended in recursive calls to alloc_pages() when a page for a pte split was requested. Avoid the recursion with a preallocated page pool, which is used to split up large mappings and gets refilled in the return path of kernel_map_pages after the split has been done. The size of the page pool is adjusted to the available memory. This part just implements the page pool and the initialization w/o using it yet. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-02-09x86: avoid unused variable warning in mm/init_64.cThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-02-07Introduce flags for reserve_bootmem()Bernhard Walle
This patchset adds a flags variable to reserve_bootmem() and uses the BOOTMEM_EXCLUSIVE flag in crashkernel reservation code to detect collisions between crashkernel area and already used memory. This patch: Change the reserve_bootmem() function to accept a new flag BOOTMEM_EXCLUSIVE. If that flag is set, the function returns with -EBUSY if the memory already has been reserved in the past. This is to avoid conflicts. Because that code runs before SMP initialisation, there's no race condition inside reserve_bootmem_core(). [akpm@linux-foundation.org: coding-style fixes] [akpm@linux-foundation.org: fix powerpc build] Signed-off-by: Bernhard Walle <bwalle@suse.de> Cc: <linux-arch@vger.kernel.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Vivek Goyal <vgoyal@in.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-06x86: mark the .rodata section also NXArjan van de Ven
The .rodata section shouldn't just be read-only, but also non-executable. This is free since we've broken up the 2MB page already anyway. also update test_nx to check for this. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-02-04x86: switch direct mapping setup over to set_pteAndi Kleen
Use set_pte() for setting up the 2MB pages in the direct mapping. Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-02-04x86: remove now unused clear_kernel_mappingAndi Kleen
Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-02-04x86: rename LARGE_PAGE_SIZE to PMD_PAGE_SIZEAndi Kleen
Fix up all users. Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-02-01x86_64: make bootmap_start page align v6Yinghai Lu
boot oopses when a system has 64 or 128 GB of RAM installed: Calling initcall 0xffffffff80bc33b6: sctp_init+0x0/0x711() BUG: unable to handle kernel NULL pointer dereference at 000000000000005f IP: [<ffffffff802bfe55>] proc_register+0xe7/0x10f PGD 0 Oops: 0000 [1] SMP CPU 0 Modules linked in: Pid: 1, comm: swapper Not tainted 2.6.24-smp-g5a514e21-dirty #6 RIP: 0010:[<ffffffff802bfe55>] [<ffffffff802bfe55>] proc_register+0xe7/0x10f RSP: 0000:ffff810824c57e60 EFLAGS: 00010246 RAX: 000000000000d7d7 RBX: ffff811024c5fa80 RCX: ffff810824c57e08 RDX: 0000000000000000 RSI: 0000000000000195 RDI: ffffffff80cc2460 RBP: ffffffffffffffff R08: 0000000000000000 R09: ffff811024c5fa80 R10: 0000000000000000 R11: 0000000000000002 R12: ffff810824c57e6c R13: 0000000000000000 R14: ffff810824c57ee0 R15: 00000006abd25bee FS: 0000000000000000(0000) GS:ffffffff80b4d000(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 000000000000005f CR3: 0000000000201000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process swapper (pid: 1, threadinfo ffff810824c56000, task ffff812024c52000) Stack: ffffffff80a57348 0000019500000000 ffff811024c5fa80 0000000000000000 00000000ffffff97 ffffffff802bfef0 0000000000000000 ffffffffffffffff 0000000000000000 ffffffff80bc3b4b ffff810824c57ee0 ffffffff80bc34a5 Call Trace: [<ffffffff802bfef0>] ? create_proc_entry+0x73/0x8a [<ffffffff80bc3b4b>] ? sctp_snmp_proc_init+0x1c/0x34 [<ffffffff80bc34a5>] ? sctp_init+0xef/0x711 [<ffffffff80b976e3>] ? kernel_init+0x175/0x2e1 [<ffffffff8020ccf8>] ? child_rip+0xa/0x12 [<ffffffff80b9756e>] ? kernel_init+0x0/0x2e1 [<ffffffff8020ccee>] ? child_rip+0x0/0x12 Code: 1e 48 83 7b 38 00 75 08 48 c7 43 38 f0 e8 82 80 48 83 7b 30 00 75 08 48 c7 43 30 d0 e9 82 80 48 c7 c7 60 24 cc 80 e8 bd 5a 54 00 <48> 8b 45 60 48 89 6b 58 48 89 5d 60 48 89 43 50 fe 05 f5 25 a0 RIP [<ffffffff802bfe55>] proc_register+0xe7/0x10f RSP <ffff810824c57e60> CR2: 000000000000005f ---[ end trace 02c2d78def82877a ]--- Kernel panic - not syncing: Attempted to kill init! it turns out some variables near end of bss are corrupted already. in System.map we have ffffffff80d40420 b rsi_table ffffffff80d40620 B krb5_seq_lock ffffffff80d40628 b i.20437 ffffffff80d40630 b xprt_rdma_inline_write_padding ffffffff80d40638 b sunrpc_table_header ffffffff80d40640 b zero ffffffff80d40644 b min_memreg ffffffff80d40648 b rpcrdma_tk_lock_g ffffffff80d40650 B sctp_assocs_id_lock ffffffff80d40658 B proc_net_sctp ffffffff80d40660 B sctp_assocs_id ffffffff80d40680 B sysctl_sctp_mem ffffffff80d40690 B sysctl_sctp_rmem ffffffff80d406a0 B sysctl_sctp_wmem ffffffff80d406b0 b sctp_ctl_socket ffffffff80d406b8 b sctp_pf_inet6_specific ffffffff80d406c0 b sctp_pf_inet_specific ffffffff80d406c8 b sctp_af_v4_specific ffffffff80d406d0 b sctp_af_v6_specific ffffffff80d406d8 b sctp_rand.33270 ffffffff80d406dc b sctp_memory_pressure ffffffff80d406e0 b sctp_sockets_allocated ffffffff80d406e4 b sctp_memory_allocated ffffffff80d406e8 b sctp_sysctl_header ffffffff80d406f0 b zero ffffffff80d406f4 A __bss_stop ffffffff80d406f4 A _end and setup_node_bootmem() will use that page 0xd40000 for bootmap Bootmem setup node 0 0000000000000000-0000000828000000 NODE_DATA [000000000008a485 - 0000000000091484] bootmap [0000000000d406f4 - 0000000000e456f3] pages 105 Bootmem setup node 1 0000000828000000-0000001028000000 NODE_DATA [0000000828000000 - 0000000828006fff] bootmap [0000000828007000 - 0000000828106fff] pages 100 Bootmem setup node 2 0000001028000000-0000001828000000 NODE_DATA [0000001028000000 - 0000001028006fff] bootmap [0000001028007000 - 0000001028106fff] pages 100 Bootmem setup node 3 0000001828000000-0000002028000000 NODE_DATA [0000001828000000 - 0000001828006fff] bootmap [0000001828007000 - 0000001828106fff] pages 100 setup_node_bootmem() makes NODE_DATA cacheline aligned, and bootmap is page-aligned. the patch updates find_e820_area() to make sure we can meet the alignment constraints. Signed-off-by: Yinghai Lu <yinghai.lu@sun.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-02-01x86_64: add debug name for early_resYinghai Lu
helps debugging problems in this rather murky area of code. Signed-off-by: Yinghai Lu <yinghai.lu@sun.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-30x86: fix overlap between pagetable with bss sectionYinghai Lu
one early crash on one 8 node 256g machine: Command line: console=uart8250,io,0x3f8,115200n8 initrd=kernel.org/mydisk11_x86_64.gz rw root=/dev/ram0 debug initcall_debug apic=debug acpi.debug_level=0x0000000f pci=routeirq ip=dhcp load_ramdisk=1 ramdisk_size=131072 BOOT_IMAGE=kernel.org/bzImage_2.6.25_k8.1 BIOS-provided physical RAM map: BIOS-e820: 0000000000000000 - 000000000009bc00 (usable) BIOS-e820: 000000000009bc00 - 00000000000a0000 (reserved) BIOS-e820: 00000000000e6000 - 0000000000100000 (reserved) BIOS-e820: 0000000000100000 - 00000000dffe0000 (usable) BIOS-e820: 00000000dffe0000 - 00000000dffee000 (ACPI data) BIOS-e820: 00000000dffee000 - 00000000dffff050 (ACPI NVS) BIOS-e820: 00000000dffff050 - 00000000e0000000 (reserved) BIOS-e820: 00000000fec00000 - 00000000fec01000 (reserved) BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved) BIOS-e820: 00000000ff700000 - 0000000100000000 (reserved) BIOS-e820: 0000000100000000 - 0000004020000000 (usable) Early serial console at I/O port 0x3f8 (options '115200n8') console [uart0] enabled end_pfn_map = 67239936 Kernel panic - not syncing: Duplicated early reservation d40000-e42000 Pid: 0, comm: swapper Not tainted 2.6.24-smp-g5a514e21-dirty #3 Call Trace: [<ffffffff80221545>] lapic_get_maxlvt+0x0/0x10 [<ffffffff80221657>] clear_local_APIC+0x5/0xcf [<ffffffff80221726>] disable_local_APIC+0x5/0x17 [<ffffffff8021fe16>] smp_send_stop+0x46/0x4c [<ffffffff80235293>] panic+0x94/0x13e [<ffffffff80bc3b03>] sctp_eps_proc_init+0x12/0x34 [<ffffffff80b9f1c5>] reserve_early+0x30/0x6c [<ffffffff80803925>] init_memory_mapping+0x2cd/0x2dc [<ffffffff80b9dc01>] setup_arch+0x21f/0x44e [<ffffffff80b978be>] start_kernel+0x6f/0x2c7 [<ffffffff80b971cc>] _sinittext+0x1cc/0x1d3 it turns out there is overlap between pgtable and bss... in System.map we have ffffffff80d40420 b rsi_table ffffffff80d40620 B krb5_seq_lock ffffffff80d40628 b i.20437 ffffffff80d40630 b xprt_rdma_inline_write_padding ffffffff80d40638 b sunrpc_table_header ffffffff80d40640 b zero ffffffff80d40644 b min_memreg ffffffff80d40648 b rpcrdma_tk_lock_g ffffffff80d40650 B sctp_assocs_id_lock ffffffff80d40658 B proc_net_sctp ffffffff80d40660 B sctp_assocs_id ffffffff80d40680 B sysctl_sctp_mem ffffffff80d40690 B sysctl_sctp_rmem ffffffff80d406a0 B sysctl_sctp_wmem ffffffff80d406b0 b sctp_ctl_socket ffffffff80d406b8 b sctp_pf_inet6_specific ffffffff80d406c0 b sctp_pf_inet_specific ffffffff80d406c8 b sctp_af_v4_specific ffffffff80d406d0 b sctp_af_v6_specific ffffffff80d406d8 b sctp_rand.33270 ffffffff80d406dc b sctp_memory_pressure ffffffff80d406e0 b sctp_sockets_allocated ffffffff80d406e4 b sctp_memory_allocated ffffffff80d406e8 b sctp_sysctl_header ffffffff80d406f0 b zero ffffffff80d406f4 A __bss_stop ffffffff80d406f4 A _end need to round up table_start to PAGE_SIZE. also make the panic more informative. Signed-off-by: Yinghai Lu <yinghai.lu@sun.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30x86: arch/x86/mm/init_64.c printk fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30x86: unify ioremapThomas Gleixner
Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30x86: cpa: fix the self-testIngo Molnar
Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30x86: init memory debuggingIngo Molnar
debug incorrect/late access to init memory, by permanently unmapping the init memory ranges. Depends on CONFIG_DEBUG_PAGEALLOC=y. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30x86: move misplaced rodata check callArjan van de Ven
It looks like a mismerge put the rodata self-check in the wrong spot; move it to the right place after marking the .rodata section read only. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30x86: add testcases for RODATA and NX protections/attributesArjan van de Ven
Latest update; I now have 4 NX tests, but 2 fail so they're #if 0'd. I also cleaned up the NX test code quite a bit, and got rid of the ugly exception table sorting stuff. From: Arjan van de Ven <arjan@linux.intel.com> This patch adds testcases for the CONFIG_DEBUG_RODATA configuration option as well as the NX CPU feature/mappings. Both testcases can move to tests/ once that patch gets merged into mainline. (I'm half considering moving the rodata test into mm/init.c but I'll wait with that until init.c is unified) As part of this I had to fix a not-quite-right alignment in the vmlinux.lds.h for the RODATA sections, which lead to 1 page less being marked read only. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30x86: make sure initmem is writableArjan van de Ven
When we free initmem, various rodata and CPA checks may have left memory read only.. this patch ensures that the memory is writable before we free it. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30x86: cpa: move flush to cpaThomas Gleixner
The set_memory_* and set_pages_* family of API's currently requires the callers to do a global tlb flush after the function call; forgetting this is a very nasty deathtrap. This patch moves the global tlb flush into each of the callers Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30x86: cpa: set_memory_notpresent()Ingo Molnar
Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30x86: convert CPA users to the new set_page_ APIArjan van de Ven
This patch converts various users of change_page_attr() to the new, more intent driven set_page_*/set_memory_* API set. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-30x86: fix early_ioremap() on 64-bitAndi Kleen
Fix early_ioremap() on x86-64 I had ACPI failures on several machines since a few days. Symptom was NUMA nodes not getting detected or worse cores not getting detected. They all came from ACPI not being able to read various of its tables. I finally bisected it down to Jeremy's "put _PAGE_GLOBAL into PAGE_KERNEL" change. With that the fix was fairly obvious. The problem was that early_ioremap() didn't use a "_all" flush that would affect the global PTEs too. So with global bits getting used everywhere now an early_ioremap would not actually flush a mapping if something else was mapped previously on that slot (which can happen with early_iounmap inbetween) This patch changes all flushes in init_64.c to be __flush_tlb_all() and fixes the problem here. Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30c_p_a(): do a simple self test at bootAndi Kleen
When CONFIG_DEBUG_RODATA is enabled undo the ro mapping and redo it again. This gives some simple testing for change_page_attr(). Signed-off-by: Andi Kleen <ak@suse.de> Acked-by: Jan Beulich <jbeulich@novell.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30x86: replace hard coded reservations in 64-bit early boot code with dynamic ↵Andi Kleen
table On x86-64 there are several memory allocations before bootmem. To avoid them stomping on each other they used to be all hard coded in bad_area(). Replace this with an array that is filled as needed. This cleans up the code considerably and allows to expand its use. Cc: peterz@infradead.org Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30x86: debug: double-check the empty zero pageIngo Molnar
temporary debugging - remove before this hits v2.6.25. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30x86: not clear empty_zero_page againYinghai Lu
empty_zero_page is in .bss section, and it is cleared in clear_bss by x86_64_start_kernel(). So don't clear that again in mem_init Signed-off-by: Yinghai Lu <yinghai.lu@sun.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30x86: kill mk_pte_hugeJeremy Fitzhardinge
It only has a single use, which can be trivially replaced. Signed-off-by: Jeremy Fitzhardinge <Jeremy.Fitzhardinge@citrix.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30x86: some whitespace cleanups in paging codeJoerg Roedel
This patch does some whitespace cleanups in the paging code to fix some checkpatch.pl warnings of my formerly merged cleanup patches. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30x86: use __PAGE_KERNEL* instead of _KERNPG_TABLEJoerg Roedel
This minor cleanup replaces _KERNPG_TABLE with the __PAGE_KERNEL* for 2MB PTEs in the 64-bit memory initialization code. The __PAGE_KERNEL* defines are more appropriate for PTEs. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30x86: 64-bit, make sparsemem vmemmap the only memory modelChristoph Lameter
Use sparsemem as the only memory model for UP, SMP and NUMA. Measurements indicate that DISCONTIGMEM has a higher overhead than sparsemem. And FLATMEMs benefits are minimal. So I think its best to simply standardize on sparsemem. Results of page allocator tests (test can be had via git from slab git tree branch tests) Measurements in cycle counts. 1000 allocations were performed and then the average cycle count was calculated. Order FlatMem Discontig SparseMem 0 639 665 641 1 567 647 593 2 679 774 692 3 763 967 781 4 961 1501 962 5 1356 2344 1392 6 2224 3982 2336 7 4869 7225 5074 8 12500 14048 12732 9 27926 28223 28165 10 58578 58714 58682 (Note that FlatMem is an SMP config and the rest NUMA configurations) Memory use: SMP Sparsemem ------------- Kernel size: text data bss dec hex filename 3849268 397739 1264856 5511863 541ab7 vmlinux total used free shared buffers cached Mem: 8242252 41164 8201088 0 352 11512 -/+ buffers/cache: 29300 8212952 Swap: 9775512 0 9775512 SMP Flatmem ----------- Kernel size: text data bss dec hex filename 3844612 397739 1264536 5506887 540747 vmlinux So 4.5k growth in text size vs. FLATMEM. total used free shared buffers cached Mem: 8244052 40544 8203508 0 352 11484 -/+ buffers/cache: 28708 8215344 2k growth in overall memory use after boot. NUMA discontig: text data bss dec hex filename 3888124 470659 1276504 5635287 55fcd7 vmlinux total used free shared buffers cached Mem: 8256256 56908 8199348 0 352 11496 -/+ buffers/cache: 45060 8211196 Swap: 9775512 0 9775512 NUMA sparse: text data bss dec hex filename 3896428 470659 1276824 5643911 561e87 vmlinux 8k text growth. Given that we fully inline virt_to_page and friends now that is rather good. total used free shared buffers cached Mem: 8264720 57240 8207480 0 352 11516 -/+ buffers/cache: 45372 8219348 Swap: 9775512 0 9775512 The total available memory is increased by 8k. This patch makes sparsemem the default and removes discontig and flatmem support from x86. [ akpm@linux-foundation.org: allnoconfig build fix ] Acked-by: Andi Kleen <ak@suse.de> Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>