summaryrefslogtreecommitdiff
path: root/arch/x86/kvm/mmu.h
AgeCommit message (Collapse)Author
2014-09-03KVM: mmio: cleanup kvm_set_mmio_spte_maskTiejun Chen
Just reuse rsvd_bits() inside kvm_set_mmio_spte_mask() for slightly better code. Signed-off-by: Tiejun Chen <tiejun.chen@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-04-23KVM: MMU: flush tlb out of mmu lock when write-protect the sptesXiao Guangrong
Now we can flush all the TLBs out of the mmu lock without TLB corruption when write-proect the sptes, it is because: - we have marked large sptes readonly instead of dropping them that means we just change the spte from writable to readonly so that we only need to care the case of changing spte from present to present (changing the spte from present to nonpresent will flush all the TLBs immediately), in other words, the only case we need to care is mmu_spte_update() - in mmu_spte_update(), we haved checked SPTE_HOST_WRITEABLE | PTE_MMU_WRITEABLE instead of PT_WRITABLE_MASK, that means it does not depend on PT_WRITABLE_MASK anymore Acked-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2014-04-14KVM: Add SMAP support when setting CR4Feng Wu
This patch adds SMAP handling logic when setting CR4 for guests Thanks a lot to Paolo Bonzini for his suggestion to use the branchless way to detect SMAP violation. Signed-off-by: Feng Wu <feng.wu@intel.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-10-03KVM: mmu: change useless int return types to voidPaolo Bonzini
kvm_mmu initialization is mostly filling in function pointers, there is no way for it to fail. Clean up unused return values. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-08-07nEPT: MMU context for nested EPTNadav Har'El
KVM's existing shadow MMU code already supports nested TDP. To use it, we need to set up a new "MMU context" for nested EPT, and create a few callbacks for it (nested_ept_*()). This context should also use the EPT versions of the page table access functions (defined in the previous patch). Then, we need to switch back and forth between this nested context and the regular MMU context when switching between L1 and L2 (when L1 runs this L2 with EPT). Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Nadav Har'El <nyh@il.ibm.com> Signed-off-by: Jun Nakajima <jun.nakajima@intel.com> Signed-off-by: Xinhao Xu <xinhao.xu@intel.com> Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com> Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-06-27KVM: MMU: fast invalidate all mmio sptesXiao Guangrong
This patch tries to introduce a very simple and scale way to invalidate all mmio sptes - it need not walk any shadow pages and hold mmu-lock KVM maintains a global mmio valid generation-number which is stored in kvm->memslots.generation and every mmio spte stores the current global generation-number into his available bits when it is created When KVM need zap all mmio sptes, it just simply increase the global generation-number. When guests do mmio access, KVM intercepts a MMIO #PF then it walks the shadow page table and get the mmio spte. If the generation-number on the spte does not equal the global generation-number, it will go to the normal #PF handler to update the mmio spte Since 19 bits are used to store generation-number on mmio spte, we zap all mmio sptes when the number is round Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Reviewed-by: Gleb Natapov <gleb@redhat.com> Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-06-27KVM: MMU: make return value of mmio page fault handler more readableXiao Guangrong
Define some meaningful names instead of raw code Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Reviewed-by: Gleb Natapov <gleb@redhat.com> Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-06-05KVM: MMU: fast invalidate all pagesXiao Guangrong
The current kvm_mmu_zap_all is really slow - it is holding mmu-lock to walk and zap all shadow pages one by one, also it need to zap all guest page's rmap and all shadow page's parent spte list. Particularly, things become worse if guest uses more memory or vcpus. It is not good for scalability In this patch, we introduce a faster way to invalidate all shadow pages. KVM maintains a global mmu invalid generation-number which is stored in kvm->arch.mmu_valid_gen and every shadow page stores the current global generation-number into sp->mmu_valid_gen when it is created When KVM need zap all shadow pages sptes, it just simply increase the global generation-number then reload root shadow pages on all vcpus. Vcpu will create a new shadow page table according to current kvm's generation-number. It ensures the old pages are not used any more. Then the obsolete pages (sp->mmu_valid_gen != kvm->arch.mmu_valid_gen) are zapped by using lock-break technique Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-03-21KVM: MMU: Rename kvm_mmu_free_some_pages() to make_mmu_pages_available()Takuya Yoshikawa
The current name "kvm_mmu_free_some_pages" should be used for something that actually frees some shadow pages, as we expect from the name, but what the function is doing is to make some, KVM_MIN_FREE_MMU_PAGES, shadow pages available: it does nothing when there are enough. This patch changes the name to reflect this meaning better; while doing this renaming, the code in the wrapper function is inlined into the main body since the whole function will be inlined into the only caller now. Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-03-13KVM: MMU: make kvm_mmu_available_pages robust against n_used_mmu_pages > ↵Marcelo Tosatti
n_max_mmu_pages As noticed by Ulrich Obergfell <uobergfe@redhat.com>, the mmu counters are for beancounting purposes only - so n_used_mmu_pages and n_max_mmu_pages could be relaxed (example: before f0f5933a1626c8df7b), resulting in n_used_mmu_pages > n_max_mmu_pages. Make code robust against n_used_mmu_pages > n_max_mmu_pages. Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
2012-09-20KVM: MMU: Optimize is_last_gpte()Avi Kivity
Instead of branchy code depending on level, gpte.ps, and mmu configuration, prepare everything in a bitmap during mode changes and look it up during runtime. Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-09-20KVM: MMU: Optimize pte permission checksAvi Kivity
walk_addr_generic() permission checks are a maze of branchy code, which is performed four times per lookup. It depends on the type of access, efer.nxe, cr0.wp, cr4.smep, and in the near future, cr4.smap. Optimize this away by precalculating all variants and storing them in a bitmap. The bitmap is recalculated when rarely-changing variables change (cr0, cr4) and is indexed by the often-changing variables (page fault error code, pte access permissions). The permission check is moved to the end of the loop, otherwise an SMEP fault could be reported as a false positive, when PDE.U=1 but PTE.U=0. Noted by Xiao Guangrong. The result is short, branch-free code. Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-09-20KVM: MMU: Push clean gpte write protection out of gpte_access()Avi Kivity
gpte_access() computes the access permissions of a guest pte and also write-protects clean gptes. This is wrong when we are servicing a write fault (since we'll be setting the dirty bit momentarily) but correct when instantiating a speculative spte, or when servicing a read fault (since we'll want to trap a following write in order to set the dirty bit). It doesn't seem to hurt in practice, but in order to make the code readable, push the write protection out of gpte_access() and into a new protect_clean_gpte() which is called explicitly when needed. Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2011-07-24KVM: MMU: mmio page fault supportXiao Guangrong
The idea is from Avi: | We could cache the result of a miss in an spte by using a reserved bit, and | checking the page fault error code (or seeing if we get an ept violation or | ept misconfiguration), so if we get repeated mmio on a page, we don't need to | search the slot list/tree. | (https://lkml.org/lkml/2011/2/22/221) When the page fault is caused by mmio, we cache the info in the shadow page table, and also set the reserved bits in the shadow page table, so if the mmio is caused again, we can quickly identify it and emulate it directly Searching mmio gfn in memslots is heavy since we need to walk all memeslots, it can be reduced by this feature, and also avoid walking guest page table for soft mmu. [jan: fix operator precedence issue] Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2011-07-24KVM: MMU: cache mmio info on page fault pathXiao Guangrong
If the page fault is caused by mmio, we can cache the mmio info, later, we do not need to walk guest page table and quickly know it is a mmio fault while we emulate the mmio instruction Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24KVM: MMU: Don't track nested fault info in error-codeJoerg Roedel
This patch moves the detection whether a page-fault was nested or not out of the error code and moves it into a separate variable in the fault struct. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24KVM: MMU: Introduce init_kvm_nested_mmu()Joerg Roedel
This patch introduces the init_kvm_nested_mmu() function which is used to re-initialize the nested mmu when the l2 guest changes its paging mode. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24KVM: MMU: Introduce kvm_init_shadow_mmu helper functionJoerg Roedel
Some logic of the init_kvm_softmmu function is required to build the Nested Nested Paging context. So factor the required logic into a seperate function and export it. Also make the whole init path suitable for more than one mmu context. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24KVM: replace x86 kvm n_free_mmu_pages with n_used_mmu_pagesDave Hansen
Doing this makes the code much more readable. That's borne out by the fact that this patch removes code. "used" also happens to be the number that we need to return back to the slab code when our shrinker gets called. Keeping this value as opposed to free makes the next patch simpler. So, 'struct kvm' is kzalloc()'d. 'struct kvm_arch' is a structure member (and not a pointer) of 'struct kvm'. That means they start out zeroed. I _think_ they get initialized properly by kvm_mmu_change_mmu_pages(). But, that only happens via kvm ioctls. Another benefit of storing 'used' intead of 'free' is that the values are consistent from the moment the structure is allocated: no negative "used" value. Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com> Signed-off-by: Tim Pepper <lnxninja@linux.vnet.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24KVM: abstract kvm x86 mmu->n_free_mmu_pagesDave Hansen
"free" is a poor name for this value. In this context, it means, "the number of mmu pages which this kvm instance should be able to allocate." But "free" implies much more that the objects are there and ready for use. "available" is a much better description, especially when you see how it is calculated. In this patch, we abstract its use into a function. We'll soon replace the function's contents by calculating the value in a different way. All of the reads of n_free_mmu_pages are taken care of in this patch. The modification sites will be handled in a patch later in the series. Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com> Signed-off-by: Tim Pepper <lnxninja@linux.vnet.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2010-03-01KVM: x86 emulator: fix memory access during x86 emulationGleb Natapov
Currently when x86 emulator needs to access memory, page walk is done with broadest permission possible, so if emulated instruction was executed by userspace process it can still access kernel memory. Fix that by providing correct memory access to page walker during emulation. Signed-off-by: Gleb Natapov <gleb@redhat.com> Cc: stable@kernel.org Signed-off-by: Avi Kivity <avi@redhat.com>
2010-03-01KVM: Move cr0/cr4/efer related helpers to x86.hAvi Kivity
They have more general scope than the mmu. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-03-01KVM: Replace read accesses of vcpu->arch.cr0 by an accessorAvi Kivity
Since we'd like to allow the guest to own a few bits of cr0 at times, we need to know when we access those bits. Signed-off-by: Avi Kivity <avi@redhat.com>
2010-03-01KVM: x86: Moving PT_*_LEVEL to mmu.hSheng Yang
We can use them in x86.c and vmx.c now... Signed-off-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-03-01KVM: Add accessor for reading cr4 (or some bits of cr4)Avi Kivity
Some bits of cr4 can be owned by the guest on vmx, so when we read them, we copy them to the vcpu structure. In preparation for making the set of guest-owned bits dynamic, use helpers to access these bits so we don't need to know where the bit resides. No changes to svm since all bits are host-owned there. Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10KVM: MMU: add kvm_mmu_get_spte_hierarchy helperMarcelo Tosatti
Required by EPT misconfiguration handler. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10KVM: MMU: Adjust pte accessors to explicitly indicate guest or shadow pteAvi Kivity
Since the guest and host ptes can have wildly different format, adjust the pte accessor names to indicate on which type of pte they operate on. No functional changes. Signed-off-by: Avi Kivity <avi@redhat.com>
2009-06-10KVM: Use rsvd_bits_mask in load_pdptrs()Dong, Eddie
Also remove bit 5-6 from rsvd_bits_mask per latest SDM. Signed-off-by: Eddie Dong <Eddie.Dong@intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2009-03-24KVM: is_long_mode() should check for EFER.LMAAmit Shah
is_long_mode currently checks the LongModeEnable bit in EFER instead of the LongModeActive bit. This is wrong, but we survived this till now since it wasn't triggered. This breaks guests that go from long mode to compatibility mode. This is noticed on a solaris guest and fixes bug #1842160 Signed-off-by: Amit Shah <amit.shah@qumranet.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-07-20KVM: MMU: Fix false flooding when a pte points to page tableAvi Kivity
The KVM MMU tries to detect when a speculative pte update is not actually used by demand fault, by checking the accessed bit of the shadow pte. If the shadow pte has not been accessed, we deem that page table flooded and remove the shadow page table, allowing further pte updates to proceed without emulation. However, if the pte itself points at a page table and only used for write operations, the accessed bit will never be set since all access will happen through the emulator. This is exactly what happens with kscand on old (2.4.x) HIGHMEM kernels. The kernel points a kmap_atomic() pte at a page table, and then proceeds with read-modify-write operations to look at the dirty and accessed bits. We get a false flood trigger on the kmap ptes, which results in the mmu spending all its time setting up and tearing down shadows. Fix by setting the shadow accessed bit on emulated accesses. Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-05-04KVM: Add kvm_x86_ops get_tdp_level()Sheng Yang
The function get_tdp_level() provided the number of tdp level for EPT and NPT rather than the NPT specific macro. Signed-off-by: Sheng Yang <sheng.yang@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-05-04KVM: MMU: Move some definitions to a header fileSheng Yang
Move some definitions to mmu.h in order to allow building common table entries between EPT and non-EPT. Signed-off-by: Sheng Yang <sheng.yang@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27KVM: MMU: add TDP support to the KVM MMUJoerg Roedel
This patch contains the changes to the KVM MMU necessary for support of the Nested Paging feature in AMD Barcelona and Phenom Processors. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30KVM: Move arch dependent files to new directory arch/x86/kvm/Avi Kivity
This paves the way for multiple architecture support. Note that while ioapic.c could potentially be shared with ia64, it is also moved. Signed-off-by: Avi Kivity <avi@qumranet.com>