summaryrefslogtreecommitdiff
path: root/arch/s390/mm/pgtable.c
AgeCommit message (Collapse)Author
2017-09-07Merge tag 'kvm-s390-next-4.14-2' of ↵Radim Krčmář
git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux KVM: s390: Fixes and features for 4.14 - merge of topic branch tlb-flushing from the s390 tree to get the no-dat base features - merge of kvm/master to avoid conflicts with additional sthyi fixes - wire up the no-dat enhancements in KVM - multiple epoch facility (z14 feature) - Configuration z/Architecture Mode - more sthyi fixes - gdb server range checking fix - small code cleanups
2017-08-29KVM: s390: expose no-DAT to guest and migration supportClaudio Imbrenda
The STFLE bit 147 indicates whether the ESSA no-DAT operation code is valid, the bit is not normally provided to the host; the host is instead provided with an SCLP bit that indicates whether guests can support the feature. This patch: * enables the STFLE bit in the guest if the corresponding SCLP bit is present in the host. * adds support for migrating the no-DAT bit in the PGSTEs * fixes the software interpretation of the ESSA instruction that is used when migrating, both for the new operation code and for the old "set stable", as per specifications. Signed-off-by: Claudio Imbrenda <imbrenda@linux.vnet.ibm.com> Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Acked-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2017-07-25s390/mm,kvm: use nodat PGSTE tag to optimize TLB flushingMartin Schwidefsky
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-07-25s390/mm: add guest ASCE TLB flush optimizationMartin Schwidefsky
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-07-25s390/mm: add no-dat TLB flush optimizationMartin Schwidefsky
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-07-13s390/mm: set change and reference bit on lazy key enablementChristian Borntraeger
When we enable storage keys for a guest lazily, we reset the ACC and F values. That is correct assuming that these are 0 on a clear reset and the guest obviously has not used any key setting instruction. We also zero out the change and reference bit. This is not correct as the architecture prefers over-indication instead of under-indication for the keyless->keyed transition. This patch fixes the behaviour and always sets guest change and guest reference for all guest storage keys on the keyless -> keyed switch. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: Claudio Imbrenda <imbrenda@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-06-12s390/mm: implement 5 level pages tablesMartin Schwidefsky
Add the logic to upgrade the page table for a 64-bit process to five levels. This increases the TASK_SIZE from 8PB to 16EB-4K. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-04-20s390/kvm: Add PGSTE manipulation functionsClaudio Imbrenda
Add PGSTE manipulation functions: * set_pgste_bits sets specific bits in a PGSTE * get_pgste returns the whole PGSTE * pgste_perform_essa manipulates a PGSTE to set specific storage states * ESSA_[SG]ET_* macros used to indicate the action for manipulate_pgste Signed-off-by: Claudio Imbrenda <imbrenda@linux.vnet.ibm.com> Reviewed-by: Janosch Frank <frankja@de.ibm.com> Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-03-02KVM: s390: Fix guest migration for huge guests resulting in panicJanosch Frank
While we can technically not run huge page guests right now, we can setup a guest with huge pages. Trying to migrate it will trigger a VM_BUG_ON and, if the kernel is not configured to panic on a BUG, it will happily try to work on non-existing page table entries. With this patch, we always return "dirty" if we encounter a large page when migrating. This at least fixes the immediate problem until we have proper handling for both kind of pages. Fixes: 15f36eb ("KVM: s390: Add proper dirty bitmap support to S390 kvm.") Cc: <stable@vger.kernel.org> # 3.16+ Signed-off-by: Janosch Frank <frankja@linux.vnet.ibm.com> Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-02-22Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull KVM updates from Paolo Bonzini: "4.11 is going to be a relatively large release for KVM, with a little over 200 commits and noteworthy changes for most architectures. ARM: - GICv3 save/restore - cache flushing fixes - working MSI injection for GICv3 ITS - physical timer emulation MIPS: - various improvements under the hood - support for SMP guests - a large rewrite of MMU emulation. KVM MIPS can now use MMU notifiers to support copy-on-write, KSM, idle page tracking, swapping, ballooning and everything else. KVM_CAP_READONLY_MEM is also supported, so that writes to some memory regions can be treated as MMIO. The new MMU also paves the way for hardware virtualization support. PPC: - support for POWER9 using the radix-tree MMU for host and guest - resizable hashed page table - bugfixes. s390: - expose more features to the guest - more SIMD extensions - instruction execution protection - ESOP2 x86: - improved hashing in the MMU - faster PageLRU tracking for Intel CPUs without EPT A/D bits - some refactoring of nested VMX entry/exit code, preparing for live migration support of nested hypervisors - expose yet another AVX512 CPUID bit - host-to-guest PTP support - refactoring of interrupt injection, with some optimizations thrown in and some duct tape removed. - remove lazy FPU handling - optimizations of user-mode exits - optimizations of vcpu_is_preempted() for KVM guests generic: - alternative signaling mechanism that doesn't pound on tsk->sighand->siglock" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (195 commits) x86/kvm: Provide optimized version of vcpu_is_preempted() for x86-64 x86/paravirt: Change vcp_is_preempted() arg type to long KVM: VMX: use correct vmcs_read/write for guest segment selector/base x86/kvm/vmx: Defer TR reload after VM exit x86/asm/64: Drop __cacheline_aligned from struct x86_hw_tss x86/kvm/vmx: Simplify segment_base() x86/kvm/vmx: Get rid of segment_base() on 64-bit kernels x86/kvm/vmx: Don't fetch the TSS base from the GDT x86/asm: Define the kernel TSS limit in a macro kvm: fix page struct leak in handle_vmon KVM: PPC: Book3S HV: Disable HPT resizing on POWER9 for now KVM: Return an error code only as a constant in kvm_get_dirty_log() KVM: Return an error code only as a constant in kvm_get_dirty_log_protect() KVM: Return directly after a failed copy_from_user() in kvm_vm_compat_ioctl() KVM: x86: remove code for lazy FPU handling KVM: race-free exit from KVM_RUN without POSIX signals KVM: PPC: Book3S HV: Turn "KVM guest htab" message into a debug message KVM: PPC: Book3S PR: Ratelimit copy data failure error messages KVM: Support vCPU-based gfn->hva cache KVM: use separate generations for each address space ...
2017-02-22Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 updates from Martin Schwidefsky: - New entropy generation for the pseudo random number generator. - Early boot printk output via sclp to help debug crashes on boot. This needs to be enabled with a kernel parameter. - Add proper no-execute support with a bit in the page table entry. - Bug fixes and cleanups. * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (65 commits) s390/syscall: fix single stepped system calls s390/zcrypt: make ap_bus explicitly non-modular s390/zcrypt: Removed unneeded debug feature directory creation. s390: add missing "do {} while (0)" loop constructs to multiline macros s390/mm: add cond_resched call to kernel page table dumper s390: get rid of MACHINE_HAS_PFMF and MACHINE_HAS_HPAGE s390/mm: make memory_block_size_bytes available for !MEMORY_HOTPLUG s390: replace ACCESS_ONCE with READ_ONCE s390: Audit and remove any remaining unnecessary uses of module.h s390: mm: Audit and remove any unnecessary uses of module.h s390: kernel: Audit and remove any unnecessary uses of module.h s390/kdump: Use "LINUX" ELF note name instead of "CORE" s390: add no-execute support s390: report new vector facilities s390: use correct input data address for setup_randomness s390/sclp: get rid of common response code handling s390/sclp: don't add new lines to each printed string s390/sclp: make early sclp code readable s390/sclp: disable early sclp code as soon as the base sclp driver is active s390/sclp: move early printk code to drivers ...
2017-02-08s390: add no-execute supportMartin Schwidefsky
Bit 0x100 of a page table, segment table of region table entry can be used to disallow code execution for the virtual addresses associated with the entry. There is one tricky bit, the system call to return from a signal is part of the signal frame written to the user stack. With a non-executable stack this would stop working. To avoid breaking things the protection fault handler checks the opcode that caused the fault for 0x0a77 (sys_sigreturn) and 0x0aad (sys_rt_sigreturn) and injects a system call. This is preferable to the alternative solution with a stub function in the vdso because it works for vdso=off and statically linked binaries as well. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-01-30KVM: s390: Fix RRBE return code not being CCJanosch Frank
reset_guest_reference_bit needs to return the CC, so we can set it in the guest PSW when emulating RRBE. Right now it only returns 0. Let's fix that. Signed-off-by: Janosch Frank <frankja@linux.vnet.ibm.com> Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2017-01-24s390/mm: Fix cmma unused transfer from pgste into pteChristian Borntraeger
The last pgtable rework silently disabled the CMMA unused state by setting a local pte variable (a parameter) instead of propagating it back into the caller. Fix it. Fixes: ebde765c0e85 ("s390/mm: uninline ptep_xxx functions from pgtable.h") Cc: stable@vger.kernel.org # v4.6+ Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Claudio Imbrenda <imbrenda@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-08-24s390/mm: merge local / non-local IDTE helperMartin Schwidefsky
Merge the __p[m|u]xdp_idte and __p[m|u]dp_idte_local functions into a single __p[m|u]dp_idte function with an additional parameter. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-08-24s390/mm: merge local / non-local IPTE helperMartin Schwidefsky
Merge the __ptep_ipte and __ptep_ipte_local functions into a single __ptep_ipte function with an additional parameter. The __pte_ipte_range function is still extra as the while loops makes it hard to merge. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-08-02Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull KVM updates from Paolo Bonzini: - ARM: GICv3 ITS emulation and various fixes. Removal of the old VGIC implementation. - s390: support for trapping software breakpoints, nested virtualization (vSIE), the STHYI opcode, initial extensions for CPU model support. - MIPS: support for MIPS64 hosts (32-bit guests only) and lots of cleanups, preliminary to this and the upcoming support for hardware virtualization extensions. - x86: support for execute-only mappings in nested EPT; reduced vmexit latency for TSC deadline timer (by about 30%) on Intel hosts; support for more than 255 vCPUs. - PPC: bugfixes. * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (302 commits) KVM: PPC: Introduce KVM_CAP_PPC_HTM MIPS: Select HAVE_KVM for MIPS64_R{2,6} MIPS: KVM: Reset CP0_PageMask during host TLB flush MIPS: KVM: Fix ptr->int cast via KVM_GUEST_KSEGX() MIPS: KVM: Sign extend MFC0/RDHWR results MIPS: KVM: Fix 64-bit big endian dynamic translation MIPS: KVM: Fail if ebase doesn't fit in CP0_EBase MIPS: KVM: Use 64-bit CP0_EBase when appropriate MIPS: KVM: Set CP0_Status.KX on MIPS64 MIPS: KVM: Make entry code MIPS64 friendly MIPS: KVM: Use kmap instead of CKSEG0ADDR() MIPS: KVM: Use virt_to_phys() to get commpage PFN MIPS: Fix definition of KSEGX() for 64-bit KVM: VMX: Add VMCS to CPU's loaded VMCSs before VMPTRLD kvm: x86: nVMX: maintain internal copy of current VMCS KVM: PPC: Book3S HV: Save/restore TM state in H_CEDE KVM: PPC: Book3S HV: Pull out TM state save/restore into separate procedures KVM: arm64: vgic-its: Simplify MAPI error handling KVM: arm64: vgic-its: Make vgic_its_cmd_handle_mapi similar to other handlers KVM: arm64: vgic-its: Turn device_id validation into generic ID validation ...
2016-07-06s390/mm: add support for 2GB hugepagesGerald Schaefer
This adds support for 2GB hugetlbfs pages on s390. Reviewed-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-20s390/mm: shadow pages with real guest requested protectionDavid Hildenbrand
We really want to avoid manually handling protection for nested virtualization. By shadowing pages with the protection the guest asked us for, the SIE can handle most protection-related actions for us (e.g. special handling for MVPG) and we can directly forward protection exceptions to the guest. PTEs will now always be shadowed with the correct _PAGE_PROTECT flag. Unshadowing will take care of any guest changes to the parent PTE and any host changes to the host PTE. If the host PTE doesn't have the fitting access rights or is not available, we have to fix it up. Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-20s390/mm: add shadow gmap supportMartin Schwidefsky
For a nested KVM guest the outer KVM host needs to create shadow page tables for the nested guest. This patch adds the basic support to the guest address space (gmap) code. For each guest address space the inner KVM host creates, the first outer KVM host needs to create shadow page tables. The address space is identified by the ASCE loaded into the control register 1 at the time the inner SIE instruction for the second nested KVM guest is executed. The outer KVM host creates the shadow tables starting with the table identified by the ASCE on a on-demand basis. The outer KVM host will get repeated faults for all the shadow tables needed to run the second KVM guest. While a shadow page table for the second KVM guest is active the access to the origin region, segment and page tables needs to be restricted for the first KVM guest. For region and segment and page tables the first KVM guest may read the memory, but write attempt has to lead to an unshadow. This is done using the page invalid and read-only bits in the page table of the first KVM guest. If the first guest re-accesses one of the origin pages of a shadow, it gets a fault and the affected parts of the shadow page table hierarchy needs to be removed again. PGSTE tables don't have to be shadowed, as all interpretation assist can't deal with the invalid bits in the shadow pte being set differently than the original ones provided by the first KVM guest. Many bug fixes and improvements by David Hildenbrand. Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-20s390/mm: extended gmap pte notifierMartin Schwidefsky
The current gmap pte notifier forces a pte into to a read-write state. If the pte is invalidated the gmap notifier is called to inform KVM that the mapping will go away. Extend this approach to allow read-write, read-only and no-access as possible target states and call the pte notifier for any change to the pte. This mechanism is used to temporarily set specific access rights for a pte without doing the heavy work of a true mprotect call. Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-13s390/mm: simplify the TLB flushing codeMartin Schwidefsky
ptep_flush_lazy and pmdp_flush_lazy use mm->context.attach_count to decide between a lazy TLB flush vs an immediate TLB flush. The field contains two 16-bit counters, the number of CPUs that have the mm attached and can create TLB entries for it and the number of CPUs in the middle of a page table update. The __tlb_flush_asce, ptep_flush_direct and pmdp_flush_direct functions use the attach counter and a mask check with mm_cpumask(mm) to decide between a local flush local of the current CPU and a global flush. For all these functions the decision between lazy vs immediate and local vs global TLB flush can be based on CPU masks. There are two masks: the mm->context.cpu_attach_mask with the CPUs that are actively using the mm, and the mm_cpumask(mm) with the CPUs that have used the mm since the last full flush. The decision between lazy vs immediate flush is based on the mm->context.cpu_attach_mask, to decide between local vs global flush the mm_cpumask(mm) is used. With this patch all checks will use the CPU masks, the old counter mm->context.attach_count with its two 16-bit values is turned into a single counter mm->context.flush_count that keeps track of the number of CPUs with incomplete page table updates. The sole user of this counter is finish_arch_post_lock_switch() which waits for the end of all page table updates. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-13s390/mm: fix vunmap vs finish_arch_post_lock_switchMartin Schwidefsky
The vunmap_pte_range() function calls ptep_get_and_clear() without any locking. ptep_get_and_clear() uses ptep_xchg_lazy()/ptep_flush_direct() for the page table update. ptep_flush_direct requires that preemption is disabled, but without any locking this is not the case. If the kernel preempts the task while the attach_counter is increased an endless loop in finish_arch_post_lock_switch() will occur the next time the task is scheduled. Add explicit preempt_disable()/preempt_enable() calls to the relevant functions in arch/s390/mm/pgtable.c. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-13KVM: s390/mm: Fix CMMA reset during rebootChristian Borntraeger
commit 1e133ab296f ("s390/mm: split arch/s390/mm/pgtable.c") factored out the page table handling code from __gmap_zap and __s390_reset_cmma into ptep_zap_unused and added a simple flag that tells which one of the function (reset or not) is to be made. This also changed the behaviour, as it also zaps unused page table entries on reset. Turns out that this is wrong as s390_reset_cmma uses the page walker, which DOES NOT take the ptl lock. The most simple fix is to not do the zapping part on reset (which uses the walker) Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Fixes: 1e133ab296f ("s390/mm: split arch/s390/mm/pgtable.c") Cc: stable@vger.kernel.org # 4.6+ Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-10KVM: s390: handle missing storage-key facilityDavid Hildenbrand
Without the storage-key facility, SIE won't interpret SSKE, ISKE and RRBE for us. So let's add proper interception handlers that will be called if lazy sske cannot be enabled. Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-10KVM: s390: pfmf: support conditional-sske facilityDavid Hildenbrand
We already indicate that facility but don't implement it in our pfmf interception handler. Let's add a new storage key handling function for conditionally setting the guest storage key. As we will reuse this function later on, let's directly implement returning the old key via parameter and indicating if any change happened via rc. Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-10s390/mm: return key via pointer in get_guest_storage_keyDavid Hildenbrand
Let's just split returning the key and reporting errors. This makes calling code easier and avoids bugs as happened already. Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-10s390/mm: simplify get_guest_storage_keyDavid Hildenbrand
We can safe a few LOC and make that function easier to understand by rewriting existing code. Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-10s390/mm: set and get guest storage key mmap lockingMartin Schwidefsky
Move the mmap semaphore locking out of set_guest_storage_key and get_guest_storage_key. This makes the two functions more like the other ptep_xxx operations and allows to avoid repeated semaphore operations if multiple keys are read or written. Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-10s390/mm: don't drop errors in get_guest_storage_keyDavid Hildenbrand
Commit 1e133ab296f3 ("s390/mm: split arch/s390/mm/pgtable.c") changed the return value of get_guest_storage_key to an unsigned char, resulting in -EFAULT getting interpreted as a valid storage key. Cc: stable@vger.kernel.org # 4.6+ Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-03-08s390/mm: split arch/s390/mm/pgtable.cMartin Schwidefsky
The pgtable.c file is quite big, before it grows any larger split it into pgtable.c, pgalloc.c and gmap.c. In addition move the gmap related header definitions into the new gmap.h header and all of the pgste helpers from pgtable.h to pgtable.c. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-03-08s390/mm: uninline pmdp_xxx functions from pgtable.hMartin Schwidefsky
The pmdp_xxx function are smaller than their ptep_xxx counterparts but to keep things symmetrical unline them as well. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-03-08s390/mm: uninline ptep_xxx functions from pgtable.hMartin Schwidefsky
The code in the various ptep_xxx functions has grown quite large, consolidate them to four out-of-line functions: ptep_xchg_direct to exchange a pte with another with immediate flushing ptep_xchg_lazy to exchange a pte with another in a batched update ptep_modify_prot_start to begin a protection flags update ptep_modify_prot_commit to commit a protection flags update Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-03-02s390/kvm: simplify set_guest_storage_keyMartin Schwidefsky
Git commit ab3f285f227fec62868037e9b1b1fd18294a83b8 "KVM: s390/mm: try a cow on read only pages for key ops" added a fixup_user_fault to set_guest_storage_key force a copy on write if the page is mapped read-only. This is supposed to fix the problem of differing storage keys for shared mappings, e.g. the empty_zero_page. But if the storage key is set before the pte is mapped the storage key update is done on the pgste. A later fault will happily map the shared page with the key from the pgste. Eventually git commit 2faee8ff9dc6f4bfe46f6d2d110add858140fb20 "s390/mm: prevent and break zero page mappings in case of storage keys" fixed this problem for the empty_zero_page. The commit makes sure that guests enabled for storage keys will not use the empty_zero_page at all. As the call to fixup_user_fault in set_guest_storage_key depends on the order of the storage key operation vs. the fault that maps the pte it does not really fix anything. Just remove it. Reviewed-by: Dominik Dingel <dingel@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-01-19s390/mm: use TASK_MAX_SIZE where applicableDominik Dingel
To improve readability we can use TASK_MAX_SIZE when we just check for the upper limit. All places explicitly dealing with 3 vs 4 level pgtables were left unchanged. Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com> Reviewed-By: Sascha Silbe <silbe@linux.vnet.ibm.com>
2016-01-15s390/mm: enable fixup_user_fault retryingDominik Dingel
By passing a non-null flag we allow fixup_user_fault to retry, which enables userfaultfd. As during these retries we might drop the mmap_sem we need to check if that happened and redo the complete chain of actions. Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com> Reviewed-by: Andrea Arcangeli <aarcange@redhat.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: "Jason J. Herne" <jjherne@linux.vnet.ibm.com> Cc: David Rientjes <rientjes@google.com> Cc: Eric B Munson <emunson@akamai.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Dominik Dingel <dingel@linux.vnet.ibm.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-15mm: bring in additional flag for fixup_user_fault to signal unlockDominik Dingel
During Jason's work with postcopy migration support for s390 a problem regarding gmap faults was discovered. The gmap code will call fixup_user_fault which will end up always in handle_mm_fault. Till now we never cared about retries, but as the userfaultfd code kind of relies on it. this needs some fix. This patchset does not take care of the futex code. I will now look closer at this. This patch (of 2): With the introduction of userfaultfd, kvm on s390 needs fixup_user_fault to pass in FAULT_FLAG_ALLOW_RETRY and give feedback if during the faulting we ever unlocked mmap_sem. This patch brings in the logic to handle retries as well as it cleans up the current documentation. fixup_user_fault was not having the same semantics as filemap_fault. It never indicated if a retry happened and so a caller wasn't able to handle that case. So we now changed the behaviour to always retry a locked mmap_sem. Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com> Reviewed-by: Andrea Arcangeli <aarcange@redhat.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: "Jason J. Herne" <jjherne@linux.vnet.ibm.com> Cc: David Rientjes <rientjes@google.com> Cc: Eric B Munson <emunson@akamai.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Dominik Dingel <dingel@linux.vnet.ibm.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-15s390, thp: remove infrastructure for handling splitting PMDsKirill A. Shutemov
With new refcounting we don't need to mark PMDs splitting. Let's drop code to handle this. pmdp_splitting_flush() is not needed too: on splitting PMD we will do pmdp_clear_flush() + set_pte_at(). pmdp_clear_flush() will do IPI as needed for fast_gup. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Sasha Levin <sasha.levin@oracle.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Cc: Jerome Marchand <jmarchan@redhat.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Rik van Riel <riel@redhat.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Steve Capper <steve.capper@linaro.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.cz> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-14mm, shmem: add internal shmem resident memory accountingJerome Marchand
Currently looking at /proc/<pid>/status or statm, there is no way to distinguish shmem pages from pages mapped to a regular file (shmem pages are mapped to /dev/zero), even though their implication in actual memory use is quite different. The internal accounting currently counts shmem pages together with regular files. As a preparation to extend the userspace interfaces, this patch adds MM_SHMEMPAGES counter to mm_rss_stat to account for shmem pages separately from MM_FILEPAGES. The next patch will expose it to userspace - this patch doesn't change the exported values yet, by adding up MM_SHMEMPAGES to MM_FILEPAGES at places where MM_FILEPAGES was used before. The only user-visible change after this patch is the OOM killer message that separates the reported "shmem-rss" from "file-rss". [vbabka@suse.cz: forward-porting, tweak changelog] Signed-off-by: Jerome Marchand <jmarchan@redhat.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-12-15KVM: s390: fix mismatch between user and in-kernel guest limitDominik Dingel
While the userspace interface requests the maximum size the gmap code expects to get a maximum address. This error resulted in bigger page tables than necessary for some guest sizes, e.g. a 2GB guest used 3 levels instead of 2. At the same time we introduce KVM_S390_NO_MEM_LIMIT, which allows in a bright future that a guest spans the complete 64 bit address space. We also switch to TASK_MAX_SIZE for the initial memory size, this is a cosmetic change as the previous size also resulted in a 4 level pagetable creation. Reported-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-08-19s390/mm: simplify page table alloc/free codeMartin Schwidefsky
With the removal of the dynamic reallocation of page tables for KVM (see git commit 0b46e0a3ec0d7a04af6a091354f1b5e1b952d70a) the page table allocation / freeing code can be simplified. The page table free code can now use the alloc_pgste bit in the mm context to decide if a page table is 2K or 4K, there is no mix of different sized page tables anymore. This eliminates the need to use "page->_mapcount == 0" to check for 4K page table. Use the lower two bits in page->_mapcount to indicate which 2K fragments of the 4K page are in use. As 31-bit support is gone, remove the two defines ALLOC_ORDER and FRAG_MASK and use the constants directly where appropriate. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-07-17revert "s390/mm: make hugepages_supported a boot time decision"Dominik Dingel
Heiko noticed that the current check for hugepage support on s390 is a little bit too harsh as systems which do not support will crash. The reason is that pageblock_order can now get negative when we set HPAGE_SHIFT to 0. To avoid all this and to avoid opening another can of worms with enabling HUGETLB_PAGE_SIZE_VARIABLE I think it would be best to simply allow architectures to define their own hugepages_supported(). Revert bea41197ead3 ("s390/mm: make hugepages_supported a boot time decision") in preparation. Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com> Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Michael Holzheu <holzheu@linux.vnet.ibm.com> Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-07-17revert "s390/mm: change HPAGE_SHIFT type to int"Dominik Dingel
Heiko noticed that the current check for hugepage support on s390 is a little bit too harsh as systems which do not support will crash. The reason is that pageblock_order can now get negative when we set HPAGE_SHIFT to 0. To avoid all this and to avoid opening another can of worms with enabling HUGETLB_PAGE_SIZE_VARIABLE I think it would be best to simply allow architectures to define their own hugepages_supported(). This patch (of 4): revert commit cf54e2fce51c ("s390/mm: change HPAGE_SHIFT type to int") in preparation. Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com> Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Michael Holzheu <holzheu@linux.vnet.ibm.com> Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-25s390/mm: change HPAGE_SHIFT type to intDominik Dingel
With making HPAGE_SHIFT an unsigned integer we also accidentally changed pageblock_order. In order to avoid compiler warnings we make HPAGE_SHFIT an int again. Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com> Suggested-by: Andrew Morton <akpm@linux-foundation.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-25s390/mm: make hugepages_supported a boot time decisionDominik Dingel
There is a potential bug with KVM and hugetlbfs if the hardware does not support hugepages (EDAT1). We fix this by making EDAT1 a hard requirement for hugepages and therefore removing and simplifying code. As s390, with the sw-emulated hugepages, was the only user of arch_prepare/release_hugepage I also removed theses calls from common and other architecture code. This patch (of 5): By dropping support for hugepages on machines which do not have the hardware feature EDAT1, we fix a potential s390 KVM bug. The bug would happen if a guest is backed by hugetlbfs (not supported currently), but does not get pagetables with PGSTE. This would lead to random memory overwrites. Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com> Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-23s390/kvm: remove delayed reallocation of page tables for KVMMartin Schwidefsky
Replacing a 2K page table with a 4K page table while a VMA is active for the affected memory region is fundamentally broken. Rip out the page table reallocation code and replace it with a simple system control 'vm.allocate_pgste'. If the system control is set the page tables for all processes are allocated as full 4K pages, even for processes that do not need it. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-03-25s390: remove 31 bit supportHeiko Carstens
Remove the 31 bit support in order to reduce maintenance cost and effectively remove dead code. Since a couple of years there is no distribution left that comes with a 31 bit kernel. The 31 bit kernel also has been broken since more than a year before anybody noticed. In addition I added a removal warning to the kernel shown at ipl for 5 minutes: a960062e5826 ("s390: add 31 bit warning message") which let everybody know about the plan to remove 31 bit code. We didn't get any response. Given that the last 31 bit only machine was introduced in 1999 let's remove the code. Anybody with 31 bit user space code can still use the compat mode. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-02-11Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 updates from Martin Schwidefsky: - The remaining patches for the z13 machine support: kernel build option for z13, the cache synonym avoidance, SMT support, compare-and-delay for spinloops and the CES5S crypto adapater. - The ftrace support for function tracing with the gcc hotpatch option. This touches common code Makefiles, Steven is ok with the changes. - The hypfs file system gets an extension to access diagnose 0x0c data in user space for performance analysis for Linux running under z/VM. - The iucv hvc console gets wildcard spport for the user id filtering. - The cacheinfo code is converted to use the generic infrastructure. - Cleanup and bug fixes. * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (42 commits) s390/process: free vx save area when releasing tasks s390/hypfs: Eliminate hypfs interval s390/hypfs: Add diagnose 0c support s390/cacheinfo: don't use smp_processor_id() in preemptible context s390/zcrypt: fixed domain scanning problem (again) s390/smp: increase maximum value of NR_CPUS to 512 s390/jump label: use different nop instruction s390/jump label: add sanity checks s390/mm: correct missing space when reporting user process faults s390/dasd: cleanup profiling s390/dasd: add locking for global_profile access s390/ftrace: hotpatch support for function tracing ftrace: let notrace function attribute disable hotpatching if necessary ftrace: allow architectures to specify ftrace compile options s390: reintroduce diag 44 calls for cpu_relax() s390/zcrypt: Add support for new crypto express (CEX5S) adapter. s390/zcrypt: Number of supported ap domains is not retrievable. s390/spinlock: add compare-and-delay to lock wait loops s390/tape: remove redundant if statement s390/hvc_iucv: add simple wildcard matches to the iucv allow filter ...
2015-01-15Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 fixes from Martin Schwidefsky: "Two small performance tweaks, the plumbing for the execveat system call and a couple of bug fixes" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: s390/uprobes: fix user space PER events s390/bpf: Fix JMP_JGE_X (A > X) and JMP_JGT_X (A >= X) s390/bpf: Fix ALU_NEG (A = -A) s390/mm: avoid using pmd_to_page for !USE_SPLIT_PMD_PTLOCKS s390/timex: fix get_tod_clock_ext() inline assembly s390: wire up execveat syscall s390/kernel: use stnsm 255 instead of stosm 0 s390/vtime: Get rid of redundant WARN_ON s390/zcrypt: kernel oops at insmod of the z90crypt device driver
2015-01-08s390/pgtable: add unsigned long castsHeiko Carstens
Get rid of warnings like this one: warning: constant 0xffe0000000000000 is so big it is unsigned long Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>