Commit Graph

326 Commits

Author SHA1 Message Date
Avi Kivity
e495606dd0 KVM: Clean up #includes
Remove unnecessary ones, and rearange the remaining in the standard order.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-07-16 12:05:49 +03:00
Avi Kivity
d6d2816849 KVM: Remove kvmfs in favor of the anonymous inodes source
kvm uses a pseudo filesystem, kvmfs, to generate inodes, a job that the
new anonymous inodes source does much better.

Cc: Davide Libenzi <davidel@xmailserver.org>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-07-16 12:05:49 +03:00
Joerg Roedel
6031a61c2e KVM: SVM: Reliably detect if SVM was disabled by BIOS
This patch adds an implementation to the svm is_disabled function to
detect reliably if the BIOS disabled the SVM feature in the CPU. This
fixes the issues with kernel panics when loading the kvm-amd module on
machines where SVM is available but disabled.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-07-16 12:05:49 +03:00
Avi Kivity
796fd1b23e KVM: VMX: Remove unnecessary code in vmx_tlb_flush()
A vmexit implicitly flushes the tlb; the code is bogus.

Noted by Shaohua Li.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-07-16 12:05:49 +03:00
Shaohua Li
88a97f0b2f KVM: MMU: Fix Wrong tlb flush order
Need to flush the tlb after updating a pte, not before.

Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-07-16 12:05:48 +03:00
Avi Kivity
75880a0112 KVM: VMX: Reinitialize the real-mode tss when entering real mode
Protected mode code may have corrupted the real-mode tss, so re-initialize
it when switching to real mode.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-07-16 12:05:48 +03:00
Luca Tettamanti
a3c870bdce KVM: Avoid useless memory write when possible
When writing to normal memory and the memory area is unchanged the write
can be safely skipped, avoiding the costly kvm_mmu_pte_write.

Signed-Off-By: Luca Tettamanti <kronos.it@gmail.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-07-16 12:05:48 +03:00
Luca Tettamanti
02c03a326a KVM: Fix x86 emulator writeback
When the old value and new one are the same the emulator skips the
write; this is undesirable when the destination is a MMIO area and the
write shall be performed regardless of the previous value. This
optimization breaks e.g. a Linux guest APIC compiled without
X86_GOOD_APIC.

Remove the check and perform the writeback stage in the emulation unless
it's explicitly disabled (currently push and some 2 bytes instructions
may disable the writeback).

Signed-Off-By: Luca Tettamanti <kronos.it@gmail.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-07-16 12:05:48 +03:00
Eddie Dong
74906345ff KVM: Add support for in-kernel pio handlers
Useful for the PIC and PIT.

Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-07-16 12:05:48 +03:00
Gregory Haskins
ff1dc7942b KVM: VMX: Fix interrupt checking on lightweight exit
With kernel-injected interrupts, we need to check for interrupts on
lightweight exits too.

Signed-off-by: Gregory Haskins <ghaskins@novell.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-07-16 12:05:48 +03:00
Gregory Haskins
2eeb2e94eb KVM: Adds support for in-kernel mmio handlers
Signed-off-by: Gregory Haskins <ghaskins@novell.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-07-16 12:05:47 +03:00
Nitin A Kamble
d9413cd757 KVM: Implement emulation of instruction "ret" (opcode 0xc3)
Signed-off-by: Nitin A Kamble <nitin.a.kamble@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-07-16 12:05:47 +03:00
Nitin A Kamble
7f0aaee07b KVM: Implement emulation of "pop reg" instruction (opcode 0x58-0x5f)
For use in real mode.

Signed-off-by: Nitin A Kamble <nitin.a.kamble@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-07-16 12:05:47 +03:00
Avi Kivity
7700270ee3 KVM: VMX: Ensure vcpu time stamp counter is monotonous
If the time stamp counter goes backwards, a guest delay loop can become
infinite.  This can happen if a vcpu is migrated to another cpu, where
the counter has a lower value than the first cpu.

Since we're doing an IPI to the first cpu anyway, we can use that to pick
up the old tsc, and use that to calculate the adjustment we need to make
to the tsc offset.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-07-16 12:05:47 +03:00
Avi Kivity
94cea1bb9d KVM: Initialize the BSP bit in the APIC_BASE msr correctly
Needs to be set on vcpu 0 only.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-07-16 12:05:47 +03:00
Shani Moideen
a3870c4789 KVM: VMX: Replace memset(<addr>, 0, PAGESIZE) with clear_page(<addr>)
Signed-off-by: Shani Moideen <shani.moideen@wipro.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-07-16 12:05:47 +03:00
Shani Moideen
129ee910df KVM: SVM: Replace memset(<addr>, 0, PAGESIZE) with clear_page(<addr>)
Signed-off-by: Shani Moideen <shani.moideen@wipro.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-07-16 12:05:46 +03:00
Avi Kivity
d9e368d612 KVM: Flush remote tlbs when reducing shadow pte permissions
When a vcpu causes a shadow tlb entry to have reduced permissions, it
must also clear the tlb on remote vcpus.  We do that by:

- setting a bit on the vcpu that requests a tlb flush before the next entry
- if the vcpu is currently executing, we send an ipi to make sure it
  exits before we continue

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-07-16 12:05:46 +03:00
Avi Kivity
39c3b86e5c KVM: Keep an upper bound of initialized vcpus
That way, we don't need to loop for KVM_MAX_VCPUS for a single vcpu
vm.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-07-16 12:05:46 +03:00
Avi Kivity
72d6e5a08a KVM: Emulate hlt on real mode for Intel
This has two use cases: the bios can't boot from disk, and guest smp
bootstrap.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-07-16 12:05:46 +03:00
Avi Kivity
d3bef15f84 KVM: Move duplicate halt handling code into kvm_main.c
Will soon have a thid user.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-07-16 12:05:46 +03:00
Avi Kivity
ef9254df0b KVM: Enable guest smp
As we don't support guest tlb shootdown yet, this is only reliable
for real-mode guests.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-07-16 12:05:46 +03:00
Avi Kivity
120e9a453b KVM: Fix adding an smp virtual machine to the vm list
If we add the vm once per vcpu, we corrupt the list if the guest has
multiple vcpus.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-07-16 12:05:45 +03:00
Avi Kivity
7b53aa5650 KVM: Fix vcpu freeing for guest smp
A vcpu can pin up to four mmu shadow pages, which means the freeing
loop will never terminate.  Fix by first unpinning shadow pages on
all vcpus, then freeing shadow pages.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-07-16 12:05:45 +03:00
Nguyen Anh Quynh
313899477f KVM: Remove unnecessary initialization and checks in mark_page_dirty()
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-07-16 12:05:45 +03:00
Robert P. J. Day
50a3485c59 KVM: Replace C code with call to ARRAY_SIZE() macro.
Signed-off-by: Robert P. J. Day <rpjday@mindspring.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-07-16 12:05:45 +03:00
Avi Kivity
17c3ba9d37 KVM: Lazy guest cr3 switching
Switch guest paging context may require us to allocate memory, which
might fail.  Instead of wiring up error paths everywhere, make context
switching lazy and actually do the switch before the next guest entry,
where we can return an error if allocation fails.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-07-16 12:05:45 +03:00
Avi Kivity
bd2b2baa5c KVM: MMU: Remove unused large page marker
This has not been used for some time, as the same information is available
in the page header.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-07-16 12:05:45 +03:00
Avi Kivity
b64b3763a5 KVM: MMU: Don't cache guest access bits in the shadow page table
This was once used to avoid accessing the guest pte when upgrading
the shadow pte from read-only to read-write.  But usually we need
to set the guest pte dirty or accessed bits anyway, so this wasn't
really exploited.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-07-16 12:05:44 +03:00
Avi Kivity
fd97dc516c KVM: MMU: Simpify accessed/dirty/present/nx bit handling
Always set the accessed and dirty bit (since having them cleared causes
a read-modify-write cycle), always set the present bit, and copy the
nx bit from the guest.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-07-16 12:05:44 +03:00
Avi Kivity
4436d46621 KVM: MMU: Remove cr0.wp tricks
No longer needed as we do everything in one place.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-07-16 12:05:44 +03:00
Avi Kivity
e663ee64ae KVM: MMU: Make setting shadow ptes atomic on i386
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-07-16 12:05:44 +03:00
Avi Kivity
0d551bb698 KVM: Make shadow pte updates atomic
With guest smp, a second vcpu might see partial updates when the first
vcpu services a page fault.  So delay all updates until we have figured
out what the pte should look like.

Note that on i386, this is still not completely atomic as a 64-bit write
will be split into two on a 32-bit machine.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-07-16 12:05:44 +03:00
Avi Kivity
a18de5a403 KVM: Move shadow pte modifications from set_pte/set_pde to set_pde_common()
We want all shadow pte modifications in one place.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-07-16 12:05:44 +03:00
Avi Kivity
97a0a01ea9 KVM: MMU: Fold fix_write_pf() into set_pte_common()
This prevents some work from being performed twice, and, more importantly,
reduces the number of places where we modify shadow ptes.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-07-16 12:05:43 +03:00
Avi Kivity
63b1ad24d2 KVM: MMU: Fold fix_read_pf() into set_pte_common()
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-07-16 12:05:43 +03:00
Avi Kivity
6598c8b242 KVM: MMU: Pass the guest pde to set_pte_common
We will need the accessed bit (in addition to the dirty bit) and
also write access (for setting the dirty bit) in a future patch.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-07-16 12:05:43 +03:00
Avi Kivity
e60d75ea29 KVM: MMU: Move set_pte_common() to pte width dependent code
In preparation of some modifications.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-07-16 12:05:43 +03:00
Avi Kivity
ef0197e8d9 KVM: MMU: Simplify fetch() a little bit
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-07-16 12:05:43 +03:00
Avi Kivity
d3d25b048b KVM: MMU: Use slab caches for shadow pages and their headers
Use slab caches instead of a simple custom list.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-07-16 12:05:43 +03:00
Eddie Dong
8d7282036f KVM: Use symbolic constants instead of magic numbers
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-07-16 12:05:42 +03:00
Markus Rechberger
06ff0d3728 KVM: Fix includes
KVM compilation fails for some .configs.  This fixes it.

Signed-off-by: Markus Rechberger <markus.rechberger@amd.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-07-16 12:05:42 +03:00
Avi Kivity
687fdbfe64 KVM: x86 emulator: implement wbinvd
Vista seems to trigger it.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-07-16 12:05:42 +03:00
Jan Engelhardt
de062065a5 Use menuconfig objects II - KVM/Virt
Make a "menuconfig" out of the Kconfig objects "menu, ..., endmenu",
so that the user can disable all the options in that menu at once
instead of having to disable each option separately.

Signed-off-by: Jan Engelhardt <jengelh@gmx.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-07-16 12:05:42 +03:00
Eddie Dong
2cc51560ae KVM: VMX: Avoid saving and restoring msr_efer on lightweight vmexit
MSR_EFER.LME/LMA bits are automatically save/restored by VMX
hardware, KVM only needs to save NX/SCE bits at time of heavy
weight VM Exit. But clearing NX bits in host envirnment may
cause system hang if the host page table is using EXB bits,
thus we leave NX bits as it is. If Host NX=1 and guest NX=0, we
can do guest page table EXB bits check before inserting a shadow
pte (though no guest is expecting to see this kind of gp fault).
If host NX=0, we present guest no Execute-Disable feature to guest,
thus no host NX=0, guest NX=1 combination.

This patch reduces raw vmexit time by ~27%.

Me: fix compile warnings on i386.

Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-07-16 12:05:42 +03:00
Eddie Dong
f2be4dd654 KVM: VMX: Cleanup redundant code in MSR set
Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-07-16 12:05:42 +03:00
Eddie Dong
a75beee6e4 KVM: VMX: Avoid saving and restoring msrs on lightweight vmexit
In a lightweight exit (where we exit and reenter the guest without
scheduling or exiting to userspace in between), we don't need various
msrs on the host, and avoiding shuffling them around reduces raw exit
time by 8%.

i386 compile fix by Daniel Hecken <dh@bahntechnik.de>.

Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-07-16 12:05:41 +03:00
Nitin A Kamble
b3f37707b0 KVM: VMX: Handle #SS faults from real mode
Instructions with address size override prefix opcode 0x67
Cause the #SS fault with 0 error code in VM86 mode.  Forward
them to the emulator.

Signed-Off-By: Nitin A Kamble <nitin.a.kamble@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-07-16 12:05:41 +03:00
Avi Kivity
cd2276a795 KVM: VMX: Use local labels in inline assembly
This makes oprofile dumps and disassebly easier to read.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-07-16 12:05:41 +03:00
Avi Kivity
cd0536d7cb KVM: Fix vmx I/O bitmap initialization on highmem systems
kunmap() expects a struct page, not a virtual address.  Fixes an oops loading
kvm-intel.ko on i386 with CONFIG_HIGHMEM.

Thanks to Michael Ivanov <deruhu@peterstar.ru> for reporting.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-07-16 12:05:41 +03:00
Avi Kivity
653e3108b7 KVM: Avoid corrupting tr in real mode
The real mode tr needs to be set to a specific tss so that I/O
instructions can function.  Divert the new tr values to the real
mode save area from where they will be restored on transition to
protected mode.

This fixes some crashes on reboot when the bios accesses an I/O
instruction.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-07-16 12:05:41 +03:00
Avi Kivity
eff708bc2b KVM: VMX: Only reload guest msrs if they are already loaded
If we set an msr via an ioctl() instead of by handling a guest exit, we
have the host state loaded, so reloading the msrs would clobber host
state instead of guest state.

This fixes a host oops (and loss of a cpu) on a guest reboot.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-07-16 12:05:41 +03:00
Avi Kivity
47ad8e689b KVM: MMU: Store shadow page tables as kernel virtual addresses, not physical
Simpifies things a bit.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-07-16 12:05:40 +03:00
Avi Kivity
4b02d6daa1 KVM: MMU: Simplify kvm_mmu_free_page() a tiny bit
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-07-16 12:05:40 +03:00
Matthew Gregan
2dc7094b56 KVM: Implement IA32_EBL_CR_POWERON msr
Attempting to boot the default 'bsd' kernel of OpenBSD 4.1 i386 in a guest
fails early in the kernel init inside p3_get_bus_clock while trying to read
the IA32_EBL_CR_POWERON MSR.  KVM logs an 'unhandled MSR' message and the
guest kernel faults.

This patch is sufficient to allow OpenBSD to boot, after which it seems to
run fine.  I'm not sure if this is the correct solution for dealing with
this particular MSR, but it works for me.

Signed-off-by: Matthew Gregan <kinetik@flim.org>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-07-16 12:05:40 +03:00
Avi Kivity
a3a0636725 KVM: Set cr0.mp for guests
This allows fwait instructions to be trapped when the guest fpu is not
loaded.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-07-16 12:05:40 +03:00
Avi Kivity
5fd86fcfc0 KVM: Consolidate guest fpu activation and deactivation
Easier to keep track of where the fpu is this way.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-07-16 12:05:39 +03:00
Avi Kivity
abd3f2d622 KVM: Rationalize exception bitmap usage
Everyone owns a piece of the exception bitmap, but they happily write to
the entire thing like there's no tomorrow.  Centralize handling in
update_exception_bitmap() and have everyone call that.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-07-16 12:05:39 +03:00
Avi Kivity
707c087430 KVM: Move some more msr mangling into vmx_save_host_state()
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-07-16 12:05:39 +03:00
Avi Kivity
33ed632921 KVM: Fix potential guest state leak into host
The lightweight vmexit path avoids saving and reloading certain host
state.  However in certain cases lightweight vmexit handling can schedule()
which requires reloading the host state.

So we store the host state in the vcpu structure, and reloaded it if we
relinquish the vcpu.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-07-16 12:05:39 +03:00
Avi Kivity
7494c0ccbb KVM: Increase mmu shadow cache to 1024 pages
This improves kbuild times by about 10%, bringing it within a respectable
25% of native.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-07-16 12:05:39 +03:00
Avi Kivity
0028425f64 KVM: Update shadow pte on write to guest pte
A typical demand page/copy on write pattern is:

- page fault on vaddr
- kvm propagates fault to guest
- guest handles fault, updates pte
- kvm traps write, clears shadow pte, resumes guest
- guest returns to userspace, re-faults on same vaddr
- kvm installs shadow pte, resumes guest
- guest continues

So, three vmexits for a single guest page fault.  But if instead of clearing
the page table entry, we update to correspond to the value that the guest
has just written, we eliminate the third vmexit.

This patch does exactly that, reducing kbuild time by about 10%.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-07-16 12:05:39 +03:00
Avi Kivity
fce0657ff9 KVM: MMU: Respect nonpae pagetable quadrant when zapping ptes
When a guest writes to a page that has an mmu shadow, we have to clear
the shadow pte corresponding to the memory location touched by the guest.

Now, in nonpae mode, a single guest page may have two or four shadow
pages (because a nonpae page maps 4MB or 4GB, whereas the pae shadow maps
2MB or 1GB), so we when we look up the page we find up to three additional
aliases for the page.  Since we _clear_ the shadow pte, it doesn't matter
except for a slight performance penalty, but if we want to _update_ the
shadow pte instead of clearing it, it is vital that we don't modify the
aliases.

Fortunately, exactly which page is needed (the "quadrant") is easily
computed, and is accessible in the shadow page header.  All we need is
to ignore shadow pages from the wrong quadrants.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-07-16 12:05:39 +03:00
Avi Kivity
09072daf37 KVM: Unify kvm_mmu_pre_write() and kvm_mmu_post_write()
Instead of calling two functions and repeating expensive checks, call one
function and provide it with before/after information.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-07-16 12:05:38 +03:00
Avi Kivity
621358455a KVM: Be more careful restoring fs on lightweight vmexit
i386 wants fs for accessing the pda even on a lightweight exit, so ensure
we can always restore it.  This fixes a regression on i386 introduced by
the lightweight vmexit patch.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-07-16 12:05:38 +03:00
Avi Kivity
a25f7e1f8c KVM: Reduce misfirings of the fork detector
The kvm mmu tries to detects forks by looking for repeated writes to a
page table.  If it sees a fork, it unshadows the page table so the page
table copying can proceed at native speed instead of being emulated.

However, the detector also triggered on simple demand paging access patterns:
a linear walk of memory would of course cause repeated writes to the same
pagetable page, causing it to unshadow prematurely.

Fix by resetting the fork detector if we detect a demand fault.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-07-16 12:05:38 +03:00
Avi Kivity
05e0c8c344 KVM: Unindent some code
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-07-16 12:05:38 +03:00
Avi Kivity
e6adf28365 KVM: Avoid saving and restoring some host CPU state on lightweight vmexit
Many msrs and the like will only be used by the host if we schedule() or
return to userspace.  Therefore, we avoid saving them if we handle the
exit within the kernel, and if a reschedule is not requested.

Based on a patch from Eddie Dong <eddie.dong@intel.com> with a couple of
fixes by me.

Signed-off-by: Yaozu(Eddie) Dong <eddie.dong@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-07-16 12:05:38 +03:00
Avi Kivity
e925c5ba93 KVM: Assume that writes smaller than 4 bytes are to non-pagetable pages
This allows us to remove write protection earlier than otherwise.  Should
some mad OS choose to use byte writes to update pagetables, it will suffer
a performance hit, but still work correctly.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-07-16 12:05:38 +03:00
Anthony Liguori
c86813393f KVM: SVM: Allow direct guest access to PC debug port
The PC debug port is used for IO delay and does not require emulation.

Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-07-16 12:05:37 +03:00
He, Qing
fdef3ad1b3 KVM: VMX: Enable io bitmaps to avoid IO port 0x80 VMEXITs
This patch enables IO bitmaps control on vmx and unmask the 0x80 port to
avoid VMEXITs caused by accessing port 0x80. 0x80 is used as delays (see
include/asm/io.h), and handling VMEXITs on its access is unnecessary but
slows things down. This patch improves kernel build test at around
3%~5%.
	Because every VM uses the same io bitmap, it is shared between
all VMs rather than a per-VM data structure.

Signed-off-by: Qing He <qing.he@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-07-16 12:05:37 +03:00
Avi Kivity
7702fd1f6f KVM: Prevent guest fpu state from leaking into the host
The lazy fpu changes did not take into account that some vmexit handlers
can sleep.  Move loading the guest state into the inner loop so that it
can be reloaded if necessary, and move loading the host state into
vmx_vcpu_put() so it can be performed whenever we relinquish the vcpu.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-06-15 12:30:59 +03:00
Sam Ravnborg
39959588f5 kvm: fix section mismatch warning in kvm-intel.o
Fix following section mismatch warning in kvm-intel.o:
WARNING: o-i386/drivers/kvm/kvm-intel.o(.init.text+0xbd): Section mismatch: reference to .exit.text: (between 'hardware_setup' and 'vmx_disabled_by_bios')

The function free_kvm_area is used in the function alloc_kvm_area which
is marked __init.
The __exit area is discarded by some archs during link-time if a
module is built-in resulting in an oops.

Note: This warning is only seen by my local copy of modpost
      but the change will soon hit upstream.

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Cc: Avi Kivity <avi@qumranet.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-06-01 08:18:30 -07:00
Alexey Dobriyan
e8edc6e03a Detach sched.h from mm.h
First thing mm.h does is including sched.h solely for can_do_mlock() inline
function which has "current" dereference inside. By dealing with can_do_mlock()
mm.h can be detached from sched.h which is good. See below, why.

This patch
a) removes unconditional inclusion of sched.h from mm.h
b) makes can_do_mlock() normal function in mm/mlock.c
c) exports can_do_mlock() to not break compilation
d) adds sched.h inclusions back to files that were getting it indirectly.
e) adds less bloated headers to some files (asm/signal.h, jiffies.h) that were
   getting them indirectly

Net result is:
a) mm.h users would get less code to open, read, preprocess, parse, ... if
   they don't need sched.h
b) sched.h stops being dependency for significant number of files:
   on x86_64 allmodconfig touching sched.h results in recompile of 4083 files,
   after patch it's only 3744 (-8.3%).

Cross-compile tested on

	all arm defconfigs, all mips defconfigs, all powerpc defconfigs,
	alpha alpha-up
	arm
	i386 i386-up i386-defconfig i386-allnoconfig
	ia64 ia64-up
	m68k
	mips
	parisc parisc-up
	powerpc powerpc-up
	s390 s390-up
	sparc sparc-up
	sparc64 sparc64-up
	um-x86_64
	x86_64 x86_64-up x86_64-defconfig x86_64-allnoconfig

as well as my two usual configs.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-21 09:18:19 -07:00
Martin Schwidefsky
eeca7a36a8 [S390] Kconfig: refine depends statements.
Refine some depends statements to limit their visibility to the
environments that are actually supported.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2007-05-10 15:46:07 +02:00
Rafael J. Wysocki
8bb7844286 Add suspend-related notifications for CPU hotplug
Since nonboot CPUs are now disabled after tasks and devices have been
frozen and the CPU hotplug infrastructure is used for this purpose, we need
special CPU hotplug notifications that will help the CPU-hotplug-aware
subsystems distinguish normal CPU hotplug events from CPU hotplug events
related to a system-wide suspend or resume operation in progress.  This
patch introduces such notifications and causes them to be used during
suspend and resume transitions.  It also changes all of the
CPU-hotplug-aware subsystems to take these notifications into consideration
(for now they are handled in the same way as the corresponding "normal"
ones).

[oleg@tv-sign.ru: cleanups]
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Gautham R Shenoy <ego@in.ibm.com>
Cc: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09 12:30:56 -07:00
Avi Kivity
2ff81f70b5 KVM: Remove unused 'instruction_length'
As we no longer emulate in userspace, this is meaningless.  We don't
compute it on SVM anyway.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03 10:52:32 +03:00
Avi Kivity
02c8320972 KVM: Don't require explicit indication of completion of mmio or pio
It is illegal not to return from a pio or mmio request without completing
it, as mmio or pio is an atomic operation.  Therefore, we can simplify
the userspace interface by avoiding the completion indication.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03 10:52:32 +03:00
Avi Kivity
e7df56e4a0 KVM: Remove extraneous guest entry on mmio read
When emulating an mmio read, we actually emulate twice: once to determine
the physical address of the mmio, and, after we've exited to userspace to
get the mmio value, we emulate again to place the value in the result
register and update any flags.

But we don't really need to enter the guest again for that, only to take
an immediate vmexit.  So, if we detect that we're doing an mmio read,
emulate a single instruction before entering the guest again.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03 10:52:32 +03:00
Anthony Liguori
94dfbdb389 KVM: SVM: Only save/restore MSRs when needed
We only have to save/restore MSR_GS_BASE on every VMEXIT.  The rest can be
saved/restored when we leave the VCPU.  Since we don't emulate the DEBUGCTL
MSRs and the guest cannot write to them, we don't have to worry about
saving/restoring them at all.

This shaves a whopping 40% off raw vmexit costs on AMD.

Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03 10:52:32 +03:00
Adrian Bunk
2807696c37 KVM: fix an if() condition
It might have worked in this case since PT_PRESENT_MASK is 1, but let's
express this correctly.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03 10:52:31 +03:00
Anthony Liguori
2ab455ccce KVM: VMX: Add lazy FPU support for VT
Only save/restore the FPU host state when the guest is actually using the
FPU.

Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03 10:52:31 +03:00
Anthony Liguori
25c4c2762e KVM: VMX: Properly shadow the CR0 register in the vcpu struct
Set all of the host mask bits for CR0 so that we can maintain a proper
shadow of CR0.  This exposes CR0.TS, paving the way for lazy fpu handling.

Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03 10:52:31 +03:00
Avi Kivity
e0e5127d06 KVM: Don't complain about cpu erratum AA15
It slows down Windows x64 horribly.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03 10:52:31 +03:00
Anthony Liguori
7807fa6ca5 KVM: Lazy FPU support for SVM
Avoid saving and restoring the guest fpu state on every exit.  This
shaves ~100 cycles off the guest/host switch.

Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03 10:52:31 +03:00
Avi Kivity
4c690a1e86 KVM: Allow passing 64-bit values to the emulated read/write API
This simplifies the API somewhat (by eliminating the special-case
cmpxchg8b on i386).

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03 10:52:31 +03:00
Avi Kivity
1165f5fec1 KVM: Per-vcpu statistics
Make the exit statistics per-vcpu instead of global.  This gives a 3.5%
boost when running one virtual machine per core on my two socket dual core
(4 cores total) machine.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03 10:52:30 +03:00
Yaozu Dong
3fca036530 KVM: VMX: Avoid unnecessary vcpu_load()/vcpu_put() cycles
By checking if a reschedule is needed, we avoid dropping the vcpu.

[With changes by me, based on Anthony Liguori's observations]

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03 10:52:30 +03:00
Yaozu Dong
d6c69ee9a2 KVM: MMU: Avoid heavy ASSERT at non debug mode.
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03 10:52:30 +03:00
Avi Kivity
4d56c8a787 KVM: VMX: Only save/restore MSR_K6_STAR if necessary
Intel hosts only support syscall/sysret in long more (and only if efer.sce
is enabled), so only reload the related MSR_K6_STAR if the guest will
actually be able to use it.

This reduces vmexit cost by about 500 cycles (6400 -> 5870) on my setup.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03 10:52:30 +03:00
Avi Kivity
35cc7f9711 KVM: Fold drivers/kvm/kvm_vmx.h into drivers/kvm/vmx.c
No meat in that file.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03 10:52:30 +03:00
Avi Kivity
e38aea3e93 KVM: VMX: Don't switch 64-bit msrs for 32-bit guests
Some msrs are only used by x86_64 instructions, and are therefore
not needed when the guest is legacy mode.  By not bothering to switch
them, we reduce vmexit latency by 2400 cycles (from about 8800) when
running a 32-bt guest on a 64-bit host.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03 10:52:30 +03:00
Avi Kivity
2345df8c55 KVM: VMX: Reduce unnecessary saving of host msrs
THe automatically switched msrs are never changed on the host (with
the exception of MSR_KERNEL_GS_BASE) and thus there is no need to save
them on every vm entry.

This reduces vmexit latency by ~400 cycles on i386 and by ~900 cycles (10%)
on x86_64.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03 10:52:29 +03:00
Avi Kivity
c9047f5333 KVM: Handle guest page faults when emulating mmio
Usually, guest page faults are detected by the kvm page fault handler,
which detects if they are shadow faults, mmio faults, pagetable faults,
or normal guest page faults.

However, in ceratin circumstances, we can detect a page fault much later.
One of these events is the following combination:

- A two memory operand instruction (e.g. movsb) is executed.
- The first operand is in mmio space (which is the fault reported to kvm)
- The second operand is in an ummaped address (e.g. a guest page fault)

The Windows 2000 installer does such an access, an promptly hangs.  Fix
by adding the missing page fault injection on that path.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03 10:52:29 +03:00
Avi Kivity
364b625b56 KVM: SVM: Report hardware exit reason to userspace instead of dmesg
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03 10:52:29 +03:00
Avi Kivity
8c4385024d KVM: Retry sleeping allocation if atomic allocation fails
This avoids -ENOMEM under memory pressure.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03 10:52:29 +03:00
Avi Kivity
b5a33a7572 KVM: Use slab caches to allocate mmu data structures
Better leak detection, statistics, memory use, speed -- goodness all
around.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03 10:52:29 +03:00
Avi Kivity
417726a3fb KVM: Handle partial pae pdptr
Some guests (Solaris) do not set up all four pdptrs, but leave some invalid.
kvm incorrectly treated these as valid page directories, pinning the
wrong pages and causing general confusion.

Fix by checking the valid bit of a pae pdpte.  This closes sourceforge bug
1698922.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03 10:52:29 +03:00
Avi Kivity
d917a6b92d KVM: Initialize cr0 to indicate an fpu is present
Solaris panics if it sees a cpu with no fpu, and it seems to rely on this
bit.  Closes sourceforge bug 1698920.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03 10:52:29 +03:00
Eric Sesterhenn / Snakebyte
3964994bb5 KVM: Fix overflow bug in overflow detection code
The expression

   sp - 6 < sp

where sp is a u16 is undefined in C since 'sp - 6' is promoted to int,
and signed overflow is undefined in C.  gcc 4.2 actually warns about it.
Replace with a simpler test.

Signed-off-by: Eric Sesterhenn <snakebyte@gmx.de>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03 10:52:29 +03:00