--- zzzz-none-000/linux-3.10.107/Documentation/virtual/kvm/api.txt 2017-06-27 09:49:32.000000000 +0000 +++ scorpion-7490-727/linux-3.10.107/Documentation/virtual/kvm/api.txt 2021-02-04 17:41:59.000000000 +0000 @@ -53,7 +53,7 @@ facility that allows backward-compatible extensions to the API to be queried and used. -The extension mechanism is not based on on the Linux version number. +The extension mechanism is not based on the Linux version number. Instead, kvm defines extension identifiers and a facility to query whether a particular extension identifier is available. If it is, a set of ioctls is available for application use. @@ -68,9 +68,12 @@ Capability: which KVM extension provides this ioctl. Can be 'basic', which means that is will be provided by any kernel that supports - API version 12 (see section 4.1), or a KVM_CAP_xyz constant, which + API version 12 (see section 4.1), a KVM_CAP_xyz constant, which means availability needs to be checked with KVM_CHECK_EXTENSION - (see section 4.4). + (see section 4.4), or 'none' which means that while not all kernels + support this ioctl, there's no capability bit to check its + availability: for kernels that don't support the ioctl, + the ioctl returns -ENOTTY. Architectures: which instruction set architectures provide this ioctl. x86 includes both i386 and x86_64. @@ -148,9 +151,9 @@ 4.4 KVM_CHECK_EXTENSION -Capability: basic +Capability: basic, KVM_CAP_CHECK_EXTENSION_VM for vm ioctl Architectures: all -Type: system ioctl +Type: system ioctl, vm ioctl Parameters: extension identifier (KVM_CAP_*) Returns: 0 if unsupported; 1 (or some other positive integer) if supported @@ -160,6 +163,9 @@ Generally 0 means no and 1 means yes, but some extensions may report additional information in the integer return value. +Based on their initialization different VMs may have different capabilities. +It is thus encouraged to use the vm ioctl to query for capabilities (available +with KVM_CAP_CHECK_EXTENSION_VM on the vm fd) 4.5 KVM_GET_VCPU_MMAP_SIZE @@ -248,6 +254,11 @@ memory slot. Ensure the entire structure is cleared to avoid padding issues. +If KVM_CAP_MULTI_ADDRESS_SPACE is available, bits 16-31 specifies +the address space for which you want to return the dirty bitmap. +They must be less than the value that KVM_CHECK_EXTENSION returns for +the KVM_CAP_MULTI_ADDRESS_SPACE capability. + 4.9 KVM_SET_MEMORY_ALIAS @@ -280,7 +291,7 @@ 4.11 KVM_GET_REGS Capability: basic -Architectures: all except ARM +Architectures: all except ARM, arm64 Type: vcpu ioctl Parameters: struct kvm_regs (out) Returns: 0 on success, -1 on error @@ -297,11 +308,20 @@ __u64 rip, rflags; }; +/* mips */ +struct kvm_regs { + /* out (KVM_GET_REGS) / in (KVM_SET_REGS) */ + __u64 gpr[32]; + __u64 hi; + __u64 lo; + __u64 pc; +}; + 4.12 KVM_SET_REGS Capability: basic -Architectures: all except ARM +Architectures: all except ARM, arm64 Type: vcpu ioctl Parameters: struct kvm_regs (in) Returns: 0 on success, -1 on error @@ -378,13 +398,12 @@ 4.16 KVM_INTERRUPT Capability: basic -Architectures: x86, ppc +Architectures: x86, ppc, mips Type: vcpu ioctl Parameters: struct kvm_interrupt (in) -Returns: 0 on success, -1 on error +Returns: 0 on success, negative on failure. -Queues a hardware interrupt vector to be injected. This is only -useful if in-kernel local APIC or equivalent is not used. +Queues a hardware interrupt vector to be injected. /* for KVM_INTERRUPT */ struct kvm_interrupt { @@ -394,7 +413,14 @@ X86: -Note 'irq' is an interrupt vector, not an interrupt pin or line. +Returns: 0 on success, + -EEXIST if an interrupt is already enqueued + -EINVAL the the irq number is invalid + -ENXIO if the PIC is in the kernel + -EFAULT if the pointer is invalid + +Note 'irq' is an interrupt vector, not an interrupt pin or line. This +ioctl is useful if the in-kernel PIC is not used. PPC: @@ -423,6 +449,11 @@ Note that any value for 'irq' other than the ones stated above is invalid and incurs unexpected behavior. +MIPS: + +Queues an external interrupt to be injected into the virtual CPU. A negative +interrupt number dequeues the interrupt. + 4.17 KVM_DEBUG_GUEST @@ -512,7 +543,7 @@ 4.21 KVM_SET_SIGNAL_MASK Capability: basic -Architectures: x86 +Architectures: all Type: vcpu ioctl Parameters: struct kvm_signal_mask (in) Returns: 0 on success, -1 on error @@ -586,23 +617,29 @@ 4.24 KVM_CREATE_IRQCHIP -Capability: KVM_CAP_IRQCHIP -Architectures: x86, ia64, ARM +Capability: KVM_CAP_IRQCHIP, KVM_CAP_S390_IRQCHIP (s390) +Architectures: x86, ARM, arm64, s390 Type: vm ioctl Parameters: none Returns: 0 on success, -1 on error -Creates an interrupt controller model in the kernel. On x86, creates a virtual -ioapic, a virtual PIC (two PICs, nested), and sets up future vcpus to have a -local APIC. IRQ routing for GSIs 0-15 is set to both PIC and IOAPIC; GSI 16-23 -only go to the IOAPIC. On ia64, a IOSAPIC is created. On ARM, a GIC is -created. +Creates an interrupt controller model in the kernel. +On x86, creates a virtual ioapic, a virtual PIC (two PICs, nested), and sets up +future vcpus to have a local APIC. IRQ routing for GSIs 0-15 is set to both +PIC and IOAPIC; GSI 16-23 only go to the IOAPIC. +On ARM/arm64, a GICv2 is created. Any other GIC versions require the usage of +KVM_CREATE_DEVICE, which also supports creating a GICv2. Using +KVM_CREATE_DEVICE is preferred over KVM_CREATE_IRQCHIP for GICv2. +On s390, a dummy irq routing table is created. + +Note that on s390 the KVM_CAP_S390_IRQCHIP vm capability needs to be enabled +before KVM_CREATE_IRQCHIP can be used. 4.25 KVM_IRQ_LINE Capability: KVM_CAP_IRQCHIP -Architectures: x86, ia64, arm +Architectures: x86, arm, arm64 Type: vm ioctl Parameters: struct kvm_irq_level Returns: 0 on success, -1 on error @@ -612,9 +649,24 @@ been previously created with KVM_CREATE_IRQCHIP. Note that edge-triggered interrupts require the level to be set to 1 and then back to 0. -ARM can signal an interrupt either at the CPU level, or at the in-kernel irqchip -(GIC), and for in-kernel irqchip can tell the GIC to use PPIs designated for -specific cpus. The irq field is interpreted like this: +On real hardware, interrupt pins can be active-low or active-high. This +does not matter for the level field of struct kvm_irq_level: 1 always +means active (asserted), 0 means inactive (deasserted). + +x86 allows the operating system to program the interrupt polarity +(active-low/active-high) for level-triggered interrupts, and KVM used +to consider the polarity. However, due to bitrot in the handling of +active-low interrupts, the above convention is now valid on x86 too. +This is signaled by KVM_CAP_X86_IOAPIC_POLARITY_IGNORED. Userspace +should not present interrupts to the guest as active-low unless this +capability is present (or unless it is not using the in-kernel irqchip, +of course). + + +ARM/arm64 can signal an interrupt either at the CPU level, or at the +in-kernel irqchip (GIC), and for in-kernel irqchip can tell the GIC to +use PPIs designated for specific cpus. The irq field is interpreted +like this:  bits: | 31 ... 24 | 23 ... 16 | 15 ... 0 | field: | irq_type | vcpu_index | irq_id | @@ -627,7 +679,7 @@ (The irq_id field thus corresponds nicely to the IRQ ID in the ARM GIC specs) -In both cases, level is used to raise/lower the line. +In both cases, level is used to assert/deassert the line. struct kvm_irq_level { union { @@ -641,7 +693,7 @@ 4.26 KVM_GET_IRQCHIP Capability: KVM_CAP_IRQCHIP -Architectures: x86, ia64 +Architectures: x86 Type: vm ioctl Parameters: struct kvm_irqchip (in/out) Returns: 0 on success, -1 on error @@ -663,7 +715,7 @@ 4.27 KVM_SET_IRQCHIP Capability: KVM_CAP_IRQCHIP -Architectures: x86, ia64 +Architectures: x86 Type: vm ioctl Parameters: struct kvm_irqchip (in) Returns: 0 on success, -1 on error @@ -779,11 +831,21 @@ } nmi; __u32 sipi_vector; __u32 flags; + struct { + __u8 smm; + __u8 pending; + __u8 smm_inside_nmi; + __u8 latched_init; + } smi; }; -KVM_VCPUEVENT_VALID_SHADOW may be set in the flags field to signal that -interrupt.shadow contains a valid state. Otherwise, this field is undefined. +Only two fields are defined in the flags field: + +- KVM_VCPUEVENT_VALID_SHADOW may be set in the flags field to signal that + interrupt.shadow contains a valid state. +- KVM_VCPUEVENT_VALID_SMM may be set in the flags field to signal that + smi contains a valid state. 4.32 KVM_SET_VCPU_EVENTS @@ -800,17 +862,20 @@ See KVM_GET_VCPU_EVENTS for the data structure. Fields that may be modified asynchronously by running VCPUs can be excluded -from the update. These fields are nmi.pending and sipi_vector. Keep the -corresponding bits in the flags field cleared to suppress overwriting the -current in-kernel state. The bits are: +from the update. These fields are nmi.pending, sipi_vector, smi.smm, +smi.pending. Keep the corresponding bits in the flags field cleared to +suppress overwriting the current in-kernel state. The bits are: KVM_VCPUEVENT_VALID_NMI_PENDING - transfer nmi.pending to the kernel KVM_VCPUEVENT_VALID_SIPI_VECTOR - transfer sipi_vector +KVM_VCPUEVENT_VALID_SMM - transfer the smi sub-struct. If KVM_CAP_INTR_SHADOW is available, KVM_VCPUEVENT_VALID_SHADOW can be set in the flags field to signal that interrupt.shadow contains a valid state and shall be written into the VCPU. +KVM_VCPUEVENT_VALID_SMM can only be set if KVM_CAP_X86_SMM is available. + 4.33 KVM_GET_DEBUGREGS @@ -870,6 +935,13 @@ physical memory space, or its flags may be modified. It may not be resized. Slots may not overlap in guest physical address space. +If KVM_CAP_MULTI_ADDRESS_SPACE is available, bits 16-31 of "slot" +specifies the address space which is being modified. They must be +less than the value that KVM_CHECK_EXTENSION returns for the +KVM_CAP_MULTI_ADDRESS_SPACE capability. Slots in separate address spaces +are unrelated; the restriction on overlapping slots only applies within +each address space. + Memory for the region is taken starting at the address denoted by the field userspace_addr, which must point at user addressable memory for the entire memory slot size. Any object may back this memory, including @@ -917,9 +989,10 @@ 4.37 KVM_ENABLE_CAP -Capability: KVM_CAP_ENABLE_CAP -Architectures: ppc, s390 -Type: vcpu ioctl +Capability: KVM_CAP_ENABLE_CAP, KVM_CAP_ENABLE_CAP_VM +Architectures: x86 (only KVM_CAP_ENABLE_CAP_VM), + mips (only KVM_CAP_ENABLE_CAP), ppc, s390 +Type: vcpu ioctl, vm ioctl (with KVM_CAP_ENABLE_CAP_VM) Parameters: struct kvm_enable_cap (in) Returns: 0 on success; -1 on error @@ -950,11 +1023,13 @@ __u8 pad[64]; }; +The vcpu ioctl should be used for vcpu-specific capabilities, the vm ioctl +for vm-wide capabilities. 4.38 KVM_GET_MP_STATE Capability: KVM_CAP_MP_STATE -Architectures: x86, ia64 +Architectures: x86, s390, arm, arm64 Type: vcpu ioctl Parameters: struct kvm_mp_state (out) Returns: 0 on success; -1 on error @@ -968,24 +1043,35 @@ Possible values are: - - KVM_MP_STATE_RUNNABLE: the vcpu is currently running + - KVM_MP_STATE_RUNNABLE: the vcpu is currently running [x86,arm/arm64] - KVM_MP_STATE_UNINITIALIZED: the vcpu is an application processor (AP) - which has not yet received an INIT signal + which has not yet received an INIT signal [x86] - KVM_MP_STATE_INIT_RECEIVED: the vcpu has received an INIT signal, and is - now ready for a SIPI + now ready for a SIPI [x86] - KVM_MP_STATE_HALTED: the vcpu has executed a HLT instruction and - is waiting for an interrupt + is waiting for an interrupt [x86] - KVM_MP_STATE_SIPI_RECEIVED: the vcpu has just received a SIPI (vector - accessible via KVM_GET_VCPU_EVENTS) + accessible via KVM_GET_VCPU_EVENTS) [x86] + - KVM_MP_STATE_STOPPED: the vcpu is stopped [s390,arm/arm64] + - KVM_MP_STATE_CHECK_STOP: the vcpu is in a special error state [s390] + - KVM_MP_STATE_OPERATING: the vcpu is operating (running or halted) + [s390] + - KVM_MP_STATE_LOAD: the vcpu is in a special load/startup state + [s390] + +On x86, this ioctl is only useful after KVM_CREATE_IRQCHIP. Without an +in-kernel irqchip, the multiprocessing state must be maintained by userspace on +these architectures. -This ioctl is only useful after KVM_CREATE_IRQCHIP. Without an in-kernel -irqchip, the multiprocessing state must be maintained by userspace. +For arm/arm64: +The only states that are valid are KVM_MP_STATE_STOPPED and +KVM_MP_STATE_RUNNABLE which reflect if the vcpu is paused or not. 4.39 KVM_SET_MP_STATE Capability: KVM_CAP_MP_STATE -Architectures: x86, ia64 +Architectures: x86, s390, arm, arm64 Type: vcpu ioctl Parameters: struct kvm_mp_state (in) Returns: 0 on success; -1 on error @@ -993,9 +1079,14 @@ Sets the vcpu's current "multiprocessing state"; see KVM_GET_MP_STATE for arguments. -This ioctl is only useful after KVM_CREATE_IRQCHIP. Without an in-kernel -irqchip, the multiprocessing state must be maintained by userspace. +On x86, this ioctl is only useful after KVM_CREATE_IRQCHIP. Without an +in-kernel irqchip, the multiprocessing state must be maintained by userspace on +these architectures. + +For arm/arm64: +The only states that are valid are KVM_MP_STATE_STOPPED and +KVM_MP_STATE_RUNNABLE which reflect if the vcpu should be paused or not. 4.40 KVM_SET_IDENTITY_MAP_ADDR @@ -1019,7 +1110,7 @@ 4.41 KVM_SET_BOOT_CPU_ID Capability: KVM_CAP_SET_BOOT_CPU_ID -Architectures: x86, ia64 +Architectures: x86 Type: vm ioctl Parameters: unsigned long vcpu_id Returns: 0 on success, -1 on error @@ -1121,9 +1212,9 @@ struct kvm_cpuid_entry2 entries[0]; }; -#define KVM_CPUID_FLAG_SIGNIFCANT_INDEX 1 -#define KVM_CPUID_FLAG_STATEFUL_FUNC 2 -#define KVM_CPUID_FLAG_STATE_READ_NEXT 4 +#define KVM_CPUID_FLAG_SIGNIFCANT_INDEX BIT(0) +#define KVM_CPUID_FLAG_STATEFUL_FUNC BIT(1) +#define KVM_CPUID_FLAG_STATE_READ_NEXT BIT(2) struct kvm_cpuid_entry2 { __u32 function; @@ -1209,10 +1300,10 @@ /* the host supports the ePAPR idle hcall #define KVM_PPC_PVINFO_FLAGS_EV_IDLE (1<<0) -4.48 KVM_ASSIGN_PCI_DEVICE +4.48 KVM_ASSIGN_PCI_DEVICE (deprecated) -Capability: KVM_CAP_DEVICE_ASSIGNMENT -Architectures: x86 ia64 +Capability: none +Architectures: x86 Type: vm ioctl Parameters: struct kvm_assigned_pci_dev (in) Returns: 0 on success, -1 on error @@ -1252,25 +1343,36 @@ device assignment. The user requesting this ioctl must have read/write access to the PCI sysfs resource files associated with the device. +Errors: + ENOTTY: kernel does not support this ioctl + + Other error conditions may be defined by individual device types or + have their standard meanings. -4.49 KVM_DEASSIGN_PCI_DEVICE -Capability: KVM_CAP_DEVICE_DEASSIGNMENT -Architectures: x86 ia64 +4.49 KVM_DEASSIGN_PCI_DEVICE (deprecated) + +Capability: none +Architectures: x86 Type: vm ioctl Parameters: struct kvm_assigned_pci_dev (in) Returns: 0 on success, -1 on error Ends PCI device assignment, releasing all associated resources. -See KVM_CAP_DEVICE_ASSIGNMENT for the data structure. Only assigned_dev_id is +See KVM_ASSIGN_PCI_DEVICE for the data structure. Only assigned_dev_id is used in kvm_assigned_pci_dev to identify the device. +Errors: + ENOTTY: kernel does not support this ioctl + + Other error conditions may be defined by individual device types or + have their standard meanings. -4.50 KVM_ASSIGN_DEV_IRQ +4.50 KVM_ASSIGN_DEV_IRQ (deprecated) Capability: KVM_CAP_ASSIGN_DEV_IRQ -Architectures: x86 ia64 +Architectures: x86 Type: vm ioctl Parameters: struct kvm_assigned_irq (in) Returns: 0 on success, -1 on error @@ -1300,11 +1402,17 @@ It is not valid to specify multiple types per host or guest IRQ. However, the IRQ type of host and guest can differ or can even be null. +Errors: + ENOTTY: kernel does not support this ioctl + + Other error conditions may be defined by individual device types or + have their standard meanings. + -4.51 KVM_DEASSIGN_DEV_IRQ +4.51 KVM_DEASSIGN_DEV_IRQ (deprecated) Capability: KVM_CAP_ASSIGN_DEV_IRQ -Architectures: x86 ia64 +Architectures: x86 Type: vm ioctl Parameters: struct kvm_assigned_irq (in) Returns: 0 on success, -1 on error @@ -1319,7 +1427,7 @@ 4.52 KVM_SET_GSI_ROUTING Capability: KVM_CAP_IRQ_ROUTING -Architectures: x86 ia64 +Architectures: x86 s390 Type: vm ioctl Parameters: struct kvm_irq_routing (in) Returns: 0 on success, -1 on error @@ -1342,6 +1450,7 @@ union { struct kvm_irq_routing_irqchip irqchip; struct kvm_irq_routing_msi msi; + struct kvm_irq_routing_s390_adapter adapter; __u32 pad[8]; } u; }; @@ -1349,6 +1458,7 @@ /* gsi routing entry types */ #define KVM_IRQ_ROUTING_IRQCHIP 1 #define KVM_IRQ_ROUTING_MSI 2 +#define KVM_IRQ_ROUTING_S390_ADAPTER 3 No flags are specified so far, the corresponding field must be set to zero. @@ -1364,11 +1474,19 @@ __u32 pad; }; +struct kvm_irq_routing_s390_adapter { + __u64 ind_addr; + __u64 summary_addr; + __u64 ind_offset; + __u32 summary_offset; + __u32 adapter_id; +}; + -4.53 KVM_ASSIGN_SET_MSIX_NR +4.53 KVM_ASSIGN_SET_MSIX_NR (deprecated) -Capability: KVM_CAP_DEVICE_MSIX -Architectures: x86 ia64 +Capability: none +Architectures: x86 Type: vm ioctl Parameters: struct kvm_assigned_msix_nr (in) Returns: 0 on success, -1 on error @@ -1387,10 +1505,10 @@ #define KVM_MAX_MSIX_PER_DEV 256 -4.54 KVM_ASSIGN_SET_MSIX_ENTRY +4.54 KVM_ASSIGN_SET_MSIX_ENTRY (deprecated) -Capability: KVM_CAP_DEVICE_MSIX -Architectures: x86 ia64 +Capability: none +Architectures: x86 Type: vm ioctl Parameters: struct kvm_assigned_msix_entry (in) Returns: 0 on success, -1 on error @@ -1405,6 +1523,12 @@ __u16 padding[3]; }; +Errors: + ENOTTY: kernel does not support this ioctl + + Other error conditions may be defined by individual device types or + have their standard meanings. + 4.55 KVM_SET_TSC_KHZ @@ -1461,7 +1585,7 @@ char regs[KVM_APIC_REG_SIZE]; }; -Copies the input argument into the the Local APIC registers. The data format +Copies the input argument into the Local APIC registers. The data format and layout are the same as documented in the architecture manual. @@ -1480,7 +1604,7 @@ struct kvm_ioeventfd { __u64 datamatch; __u64 addr; /* legal pio/mmio address */ - __u32 len; /* 1, 2, 4, or 8 bytes */ + __u32 len; /* 0, 1, 2, 4, or 8 bytes */ __s32 fd; __u32 flags; __u8 pad[36]; @@ -1503,6 +1627,10 @@ For virtio-ccw devices, addr contains the subchannel id and datamatch the virtqueue index. +With KVM_CAP_IOEVENTFD_ANY_LENGTH, a zero length ioeventfd is allowed, and +the kernel will ignore the length of guest write and may get a faster vmexit. +The speedup may only apply to specific architectures, but the ioeventfd will +work anyway. 4.60 KVM_DIRTY_TLB @@ -1537,7 +1665,7 @@ be set to the number of set bits in the bitmap. -4.61 KVM_ASSIGN_SET_INTX_MASK +4.61 KVM_ASSIGN_SET_INTX_MASK (deprecated) Capability: KVM_CAP_PCI_2_3 Architectures: x86 @@ -1656,7 +1784,7 @@ To use this to emulate the LINT1 input with KVM_CREATE_IRQCHIP, use the following algorithm: - - pause the vpcu + - pause the vcpu - read the local APIC's state (KVM_GET_LAPIC) - check whether changing LINT1 will queue an NMI (see the LVT entry for LINT1) - if so, issue KVM_NMI @@ -1683,7 +1811,7 @@ This ioctl maps the memory at "user_addr" with the length "length" to the vcpu's address space starting at "vcpu_addr". All parameters need to -be alligned by 1 megabyte. +be aligned by 1 megabyte. 4.66 KVM_S390_UCAS_UNMAP @@ -1703,7 +1831,7 @@ This ioctl unmaps the memory in the vcpu's address space starting at "vcpu_addr" with the length "length". The field "user_addr" is ignored. -All parameters need to be alligned by 1 megabyte. +All parameters need to be aligned by 1 megabyte. 4.67 KVM_S390_VCPU_FAULT @@ -1744,71 +1872,165 @@ and their own constants and width. To keep track of the implemented registers, find a list below: - Arch | Register | Width (bits) - | | - PPC | KVM_REG_PPC_HIOR | 64 - PPC | KVM_REG_PPC_IAC1 | 64 - PPC | KVM_REG_PPC_IAC2 | 64 - PPC | KVM_REG_PPC_IAC3 | 64 - PPC | KVM_REG_PPC_IAC4 | 64 - PPC | KVM_REG_PPC_DAC1 | 64 - PPC | KVM_REG_PPC_DAC2 | 64 - PPC | KVM_REG_PPC_DABR | 64 - PPC | KVM_REG_PPC_DSCR | 64 - PPC | KVM_REG_PPC_PURR | 64 - PPC | KVM_REG_PPC_SPURR | 64 - PPC | KVM_REG_PPC_DAR | 64 - PPC | KVM_REG_PPC_DSISR | 32 - PPC | KVM_REG_PPC_AMR | 64 - PPC | KVM_REG_PPC_UAMOR | 64 - PPC | KVM_REG_PPC_MMCR0 | 64 - PPC | KVM_REG_PPC_MMCR1 | 64 - PPC | KVM_REG_PPC_MMCRA | 64 - PPC | KVM_REG_PPC_PMC1 | 32 - PPC | KVM_REG_PPC_PMC2 | 32 - PPC | KVM_REG_PPC_PMC3 | 32 - PPC | KVM_REG_PPC_PMC4 | 32 - PPC | KVM_REG_PPC_PMC5 | 32 - PPC | KVM_REG_PPC_PMC6 | 32 - PPC | KVM_REG_PPC_PMC7 | 32 - PPC | KVM_REG_PPC_PMC8 | 32 - PPC | KVM_REG_PPC_FPR0 | 64 + Arch | Register | Width (bits) + | | + PPC | KVM_REG_PPC_HIOR | 64 + PPC | KVM_REG_PPC_IAC1 | 64 + PPC | KVM_REG_PPC_IAC2 | 64 + PPC | KVM_REG_PPC_IAC3 | 64 + PPC | KVM_REG_PPC_IAC4 | 64 + PPC | KVM_REG_PPC_DAC1 | 64 + PPC | KVM_REG_PPC_DAC2 | 64 + PPC | KVM_REG_PPC_DABR | 64 + PPC | KVM_REG_PPC_DSCR | 64 + PPC | KVM_REG_PPC_PURR | 64 + PPC | KVM_REG_PPC_SPURR | 64 + PPC | KVM_REG_PPC_DAR | 64 + PPC | KVM_REG_PPC_DSISR | 32 + PPC | KVM_REG_PPC_AMR | 64 + PPC | KVM_REG_PPC_UAMOR | 64 + PPC | KVM_REG_PPC_MMCR0 | 64 + PPC | KVM_REG_PPC_MMCR1 | 64 + PPC | KVM_REG_PPC_MMCRA | 64 + PPC | KVM_REG_PPC_MMCR2 | 64 + PPC | KVM_REG_PPC_MMCRS | 64 + PPC | KVM_REG_PPC_SIAR | 64 + PPC | KVM_REG_PPC_SDAR | 64 + PPC | KVM_REG_PPC_SIER | 64 + PPC | KVM_REG_PPC_PMC1 | 32 + PPC | KVM_REG_PPC_PMC2 | 32 + PPC | KVM_REG_PPC_PMC3 | 32 + PPC | KVM_REG_PPC_PMC4 | 32 + PPC | KVM_REG_PPC_PMC5 | 32 + PPC | KVM_REG_PPC_PMC6 | 32 + PPC | KVM_REG_PPC_PMC7 | 32 + PPC | KVM_REG_PPC_PMC8 | 32 + PPC | KVM_REG_PPC_FPR0 | 64 ... - PPC | KVM_REG_PPC_FPR31 | 64 - PPC | KVM_REG_PPC_VR0 | 128 + PPC | KVM_REG_PPC_FPR31 | 64 + PPC | KVM_REG_PPC_VR0 | 128 ... - PPC | KVM_REG_PPC_VR31 | 128 - PPC | KVM_REG_PPC_VSR0 | 128 + PPC | KVM_REG_PPC_VR31 | 128 + PPC | KVM_REG_PPC_VSR0 | 128 ... - PPC | KVM_REG_PPC_VSR31 | 128 - PPC | KVM_REG_PPC_FPSCR | 64 - PPC | KVM_REG_PPC_VSCR | 32 - PPC | KVM_REG_PPC_VPA_ADDR | 64 - PPC | KVM_REG_PPC_VPA_SLB | 128 - PPC | KVM_REG_PPC_VPA_DTL | 128 - PPC | KVM_REG_PPC_EPCR | 32 - PPC | KVM_REG_PPC_EPR | 32 - PPC | KVM_REG_PPC_TCR | 32 - PPC | KVM_REG_PPC_TSR | 32 - PPC | KVM_REG_PPC_OR_TSR | 32 - PPC | KVM_REG_PPC_CLEAR_TSR | 32 - PPC | KVM_REG_PPC_MAS0 | 32 - PPC | KVM_REG_PPC_MAS1 | 32 - PPC | KVM_REG_PPC_MAS2 | 64 - PPC | KVM_REG_PPC_MAS7_3 | 64 - PPC | KVM_REG_PPC_MAS4 | 32 - PPC | KVM_REG_PPC_MAS6 | 32 - PPC | KVM_REG_PPC_MMUCFG | 32 - PPC | KVM_REG_PPC_TLB0CFG | 32 - PPC | KVM_REG_PPC_TLB1CFG | 32 - PPC | KVM_REG_PPC_TLB2CFG | 32 - PPC | KVM_REG_PPC_TLB3CFG | 32 - PPC | KVM_REG_PPC_TLB0PS | 32 - PPC | KVM_REG_PPC_TLB1PS | 32 - PPC | KVM_REG_PPC_TLB2PS | 32 - PPC | KVM_REG_PPC_TLB3PS | 32 - PPC | KVM_REG_PPC_EPTCFG | 32 - PPC | KVM_REG_PPC_ICP_STATE | 64 + PPC | KVM_REG_PPC_VSR31 | 128 + PPC | KVM_REG_PPC_FPSCR | 64 + PPC | KVM_REG_PPC_VSCR | 32 + PPC | KVM_REG_PPC_VPA_ADDR | 64 + PPC | KVM_REG_PPC_VPA_SLB | 128 + PPC | KVM_REG_PPC_VPA_DTL | 128 + PPC | KVM_REG_PPC_EPCR | 32 + PPC | KVM_REG_PPC_EPR | 32 + PPC | KVM_REG_PPC_TCR | 32 + PPC | KVM_REG_PPC_TSR | 32 + PPC | KVM_REG_PPC_OR_TSR | 32 + PPC | KVM_REG_PPC_CLEAR_TSR | 32 + PPC | KVM_REG_PPC_MAS0 | 32 + PPC | KVM_REG_PPC_MAS1 | 32 + PPC | KVM_REG_PPC_MAS2 | 64 + PPC | KVM_REG_PPC_MAS7_3 | 64 + PPC | KVM_REG_PPC_MAS4 | 32 + PPC | KVM_REG_PPC_MAS6 | 32 + PPC | KVM_REG_PPC_MMUCFG | 32 + PPC | KVM_REG_PPC_TLB0CFG | 32 + PPC | KVM_REG_PPC_TLB1CFG | 32 + PPC | KVM_REG_PPC_TLB2CFG | 32 + PPC | KVM_REG_PPC_TLB3CFG | 32 + PPC | KVM_REG_PPC_TLB0PS | 32 + PPC | KVM_REG_PPC_TLB1PS | 32 + PPC | KVM_REG_PPC_TLB2PS | 32 + PPC | KVM_REG_PPC_TLB3PS | 32 + PPC | KVM_REG_PPC_EPTCFG | 32 + PPC | KVM_REG_PPC_ICP_STATE | 64 + PPC | KVM_REG_PPC_TB_OFFSET | 64 + PPC | KVM_REG_PPC_SPMC1 | 32 + PPC | KVM_REG_PPC_SPMC2 | 32 + PPC | KVM_REG_PPC_IAMR | 64 + PPC | KVM_REG_PPC_TFHAR | 64 + PPC | KVM_REG_PPC_TFIAR | 64 + PPC | KVM_REG_PPC_TEXASR | 64 + PPC | KVM_REG_PPC_FSCR | 64 + PPC | KVM_REG_PPC_PSPB | 32 + PPC | KVM_REG_PPC_EBBHR | 64 + PPC | KVM_REG_PPC_EBBRR | 64 + PPC | KVM_REG_PPC_BESCR | 64 + PPC | KVM_REG_PPC_TAR | 64 + PPC | KVM_REG_PPC_DPDES | 64 + PPC | KVM_REG_PPC_DAWR | 64 + PPC | KVM_REG_PPC_DAWRX | 64 + PPC | KVM_REG_PPC_CIABR | 64 + PPC | KVM_REG_PPC_IC | 64 + PPC | KVM_REG_PPC_VTB | 64 + PPC | KVM_REG_PPC_CSIGR | 64 + PPC | KVM_REG_PPC_TACR | 64 + PPC | KVM_REG_PPC_TCSCR | 64 + PPC | KVM_REG_PPC_PID | 64 + PPC | KVM_REG_PPC_ACOP | 64 + PPC | KVM_REG_PPC_VRSAVE | 32 + PPC | KVM_REG_PPC_LPCR | 32 + PPC | KVM_REG_PPC_LPCR_64 | 64 + PPC | KVM_REG_PPC_PPR | 64 + PPC | KVM_REG_PPC_ARCH_COMPAT | 32 + PPC | KVM_REG_PPC_DABRX | 32 + PPC | KVM_REG_PPC_WORT | 64 + PPC | KVM_REG_PPC_SPRG9 | 64 + PPC | KVM_REG_PPC_DBSR | 32 + PPC | KVM_REG_PPC_TM_GPR0 | 64 + ... + PPC | KVM_REG_PPC_TM_GPR31 | 64 + PPC | KVM_REG_PPC_TM_VSR0 | 128 + ... + PPC | KVM_REG_PPC_TM_VSR63 | 128 + PPC | KVM_REG_PPC_TM_CR | 64 + PPC | KVM_REG_PPC_TM_LR | 64 + PPC | KVM_REG_PPC_TM_CTR | 64 + PPC | KVM_REG_PPC_TM_FPSCR | 64 + PPC | KVM_REG_PPC_TM_AMR | 64 + PPC | KVM_REG_PPC_TM_PPR | 64 + PPC | KVM_REG_PPC_TM_VRSAVE | 64 + PPC | KVM_REG_PPC_TM_VSCR | 32 + PPC | KVM_REG_PPC_TM_DSCR | 64 + PPC | KVM_REG_PPC_TM_TAR | 64 + PPC | KVM_REG_PPC_TM_XER | 64 + | | + MIPS | KVM_REG_MIPS_R0 | 64 + ... + MIPS | KVM_REG_MIPS_R31 | 64 + MIPS | KVM_REG_MIPS_HI | 64 + MIPS | KVM_REG_MIPS_LO | 64 + MIPS | KVM_REG_MIPS_PC | 64 + MIPS | KVM_REG_MIPS_CP0_INDEX | 32 + MIPS | KVM_REG_MIPS_CP0_CONTEXT | 64 + MIPS | KVM_REG_MIPS_CP0_USERLOCAL | 64 + MIPS | KVM_REG_MIPS_CP0_PAGEMASK | 32 + MIPS | KVM_REG_MIPS_CP0_WIRED | 32 + MIPS | KVM_REG_MIPS_CP0_HWRENA | 32 + MIPS | KVM_REG_MIPS_CP0_BADVADDR | 64 + MIPS | KVM_REG_MIPS_CP0_COUNT | 32 + MIPS | KVM_REG_MIPS_CP0_ENTRYHI | 64 + MIPS | KVM_REG_MIPS_CP0_COMPARE | 32 + MIPS | KVM_REG_MIPS_CP0_STATUS | 32 + MIPS | KVM_REG_MIPS_CP0_CAUSE | 32 + MIPS | KVM_REG_MIPS_CP0_EPC | 64 + MIPS | KVM_REG_MIPS_CP0_PRID | 32 + MIPS | KVM_REG_MIPS_CP0_CONFIG | 32 + MIPS | KVM_REG_MIPS_CP0_CONFIG1 | 32 + MIPS | KVM_REG_MIPS_CP0_CONFIG2 | 32 + MIPS | KVM_REG_MIPS_CP0_CONFIG3 | 32 + MIPS | KVM_REG_MIPS_CP0_CONFIG4 | 32 + MIPS | KVM_REG_MIPS_CP0_CONFIG5 | 32 + MIPS | KVM_REG_MIPS_CP0_CONFIG7 | 32 + MIPS | KVM_REG_MIPS_CP0_ERROREPC | 64 + MIPS | KVM_REG_MIPS_COUNT_CTL | 64 + MIPS | KVM_REG_MIPS_COUNT_RESUME | 64 + MIPS | KVM_REG_MIPS_COUNT_HZ | 64 + MIPS | KVM_REG_MIPS_FPR_32(0..31) | 32 + MIPS | KVM_REG_MIPS_FPR_64(0..31) | 64 + MIPS | KVM_REG_MIPS_VEC_128(0..31) | 128 + MIPS | KVM_REG_MIPS_FCR_IR | 32 + MIPS | KVM_REG_MIPS_FCR_CSR | 32 + MIPS | KVM_REG_MIPS_MSA_IR | 32 + MIPS | KVM_REG_MIPS_MSA_CSR | 32 ARM registers are mapped using the lower 32 bits. The upper 16 of that is the register group type, or coprocessor number: @@ -1831,6 +2053,57 @@ ARM 64-bit FP registers have the following id bit patterns: 0x4030 0000 0012 0 + +arm64 registers are mapped using the lower 32 bits. The upper 16 of +that is the register group type, or coprocessor number: + +arm64 core/FP-SIMD registers have the following id bit patterns. Note +that the size of the access is variable, as the kvm_regs structure +contains elements ranging from 32 to 128 bits. The index is a 32bit +value in the kvm_regs structure seen as a 32bit array. + 0x60x0 0000 0010 + +arm64 CCSIDR registers are demultiplexed by CSSELR value: + 0x6020 0000 0011 00 + +arm64 system registers have the following id bit patterns: + 0x6030 0000 0013 + + +MIPS registers are mapped using the lower 32 bits. The upper 16 of that is +the register group type: + +MIPS core registers (see above) have the following id bit patterns: + 0x7030 0000 0000 + +MIPS CP0 registers (see KVM_REG_MIPS_CP0_* above) have the following id bit +patterns depending on whether they're 32-bit or 64-bit registers: + 0x7020 0000 0001 00 (32-bit) + 0x7030 0000 0001 00 (64-bit) + +MIPS KVM control registers (see above) have the following id bit patterns: + 0x7030 0000 0002 + +MIPS FPU registers (see KVM_REG_MIPS_FPR_{32,64}() above) have the following +id bit patterns depending on the size of the register being accessed. They are +always accessed according to the current guest FPU mode (Status.FR and +Config5.FRE), i.e. as the guest would see them, and they become unpredictable +if the guest FPU mode is changed. MIPS SIMD Architecture (MSA) vector +registers (see KVM_REG_MIPS_VEC_128() above) have similar patterns as they +overlap the FPU registers: + 0x7020 0000 0003 00 <0:3> (32-bit FPU registers) + 0x7030 0000 0003 00 <0:3> (64-bit FPU registers) + 0x7040 0000 0003 00 <0:3> (128-bit MSA vector registers) + +MIPS FPU control registers (see KVM_REG_MIPS_FCR_{IR,CSR} above) have the +following id bit patterns: + 0x7020 0000 0003 01 <0:3> + +MIPS MSA control registers (see KVM_REG_MIPS_MSA_{IR,CSR} above) have the +following id bit patterns: + 0x7020 0000 0003 02 <0:3> + + 4.69 KVM_GET_ONE_REG Capability: KVM_CAP_ONE_REG @@ -1972,10 +2245,10 @@ This populates and returns a structure describing the features of the "Server" class MMU emulation supported by KVM. -This can in turn be used by userspace to generate the appropariate +This can in turn be used by userspace to generate the appropriate device-tree properties for the guest operating system. -The structure contains some global informations, followed by an +The structure contains some global information, followed by an array of supported segment page sizes: struct kvm_ppc_smmu_info { @@ -2019,7 +2292,7 @@ The "enc" array is a list which for each of those segment base page size provides the list of supported actual page sizes (which can be only larger or equal to the base page size), along with the -corresponding encoding in the hash PTE. Similarily, the array is +corresponding encoding in the hash PTE. Similarly, the array is 8 entries sorted by increasing sizes and an entry with a "0" shift is an empty entry and a terminator: @@ -2035,7 +2308,7 @@ 4.75 KVM_IRQFD Capability: KVM_CAP_IRQFD -Architectures: x86 +Architectures: x86 s390 arm arm64 Type: vm ioctl Parameters: struct kvm_irqfd (in) Returns: 0 on success, -1 on error @@ -2043,7 +2316,7 @@ Allows setting an eventfd to directly trigger a guest interrupt. kvm_irqfd.fd specifies the file descriptor to use as the eventfd and kvm_irqfd.gsi specifies the irqchip pin toggled by this event. When -an event is tiggered on the eventfd, an interrupt is injected into +an event is triggered on the eventfd, an interrupt is injected into the guest using the specified gsi pin. The irqfd is removed using the KVM_IRQFD_FLAG_DEASSIGN flag, specifying both kvm_irqfd.fd and kvm_irqfd.gsi. @@ -2054,13 +2327,17 @@ additional eventfd in the kvm_irqfd.resamplefd field. When operating in resample mode, posting of an interrupt through kvm_irq.fd asserts the specified gsi in the irqchip. When the irqchip is resampled, such -as from an EOI, the gsi is de-asserted and the user is notifed via +as from an EOI, the gsi is de-asserted and the user is notified via kvm_irqfd.resamplefd. It is the user's responsibility to re-queue the interrupt if the device making use of it still requires service. Note that closing the resamplefd is not sufficient to disable the irqfd. The KVM_IRQFD_FLAG_RESAMPLE is only necessary on assignment and need not be specified with KVM_IRQFD_FLAG_DEASSIGN. +On ARM/ARM64, the gsi field in the kvm_irqfd struct specifies the Shared +Peripheral Interrupt (SPI) index, such that the GIC interrupt ID is +given by gsi + 32. + 4.76 KVM_PPC_ALLOCATE_HTAB Capability: KVM_CAP_PPC_ALLOC_HTAB @@ -2116,10 +2393,12 @@ type can be one of the following: -KVM_S390_SIGP_STOP (vcpu) - sigp restart +KVM_S390_SIGP_STOP (vcpu) - sigp stop; optional flags in parm KVM_S390_PROGRAM_INT (vcpu) - program check; code in parm KVM_S390_SIGP_SET_PREFIX (vcpu) - sigp set prefix; prefix address in parm KVM_S390_RESTART (vcpu) - restart +KVM_S390_INT_CLOCK_COMP (vcpu) - clock comparator interrupt +KVM_S390_INT_CPU_TIMER (vcpu) - CPU timer interrupt KVM_S390_INT_VIRTIO (vm) - virtio external interrupt; external interrupt parameters in parm and parm64 KVM_S390_INT_SERVICE (vm) - sclp external interrupt; sclp parameter in parm @@ -2223,8 +2502,8 @@ 4.80 KVM_SET_DEVICE_ATTR/KVM_GET_DEVICE_ATTR -Capability: KVM_CAP_DEVICE_CTRL -Type: device ioctl +Capability: KVM_CAP_DEVICE_CTRL, KVM_CAP_VM_ATTRIBUTES for vm device +Type: device ioctl, vm ioctl Parameters: struct kvm_device_attr Returns: 0 on success, -1 on error Errors: @@ -2249,8 +2528,8 @@ 4.81 KVM_HAS_DEVICE_ATTR -Capability: KVM_CAP_DEVICE_CTRL -Type: device ioctl +Capability: KVM_CAP_DEVICE_CTRL, KVM_CAP_VM_ATTRIBUTES for vm device +Type: device ioctl, vm ioctl Parameters: struct kvm_device_attr Returns: 0 on success, -1 on error Errors: @@ -2261,12 +2540,12 @@ indicate that the attribute can be read or written in the device's current state. "addr" is ignored. -4.77 KVM_ARM_VCPU_INIT +4.82 KVM_ARM_VCPU_INIT Capability: basic -Architectures: arm +Architectures: arm, arm64 Type: vcpu ioctl -Parameters: struct struct kvm_vcpu_init (in) +Parameters: struct kvm_vcpu_init (in) Returns: 0 on success; -1 on error Errors:  EINVAL:    the target is unknown, or the combination of features is invalid. @@ -2280,15 +2559,49 @@ Note that because some registers reflect machine topology, all vcpus should be created before this ioctl is invoked. +Userspace can call this function multiple times for a given vcpu, including +after the vcpu has been run. This will reset the vcpu to its initial +state. All calls to this function after the initial call must use the same +target and same set of feature flags, otherwise EINVAL will be returned. + Possible features: - KVM_ARM_VCPU_POWER_OFF: Starts the CPU in a power-off state. - Depends on KVM_CAP_ARM_PSCI. + Depends on KVM_CAP_ARM_PSCI. If not set, the CPU will be powered on + and execute guest code when KVM_RUN is called. + - KVM_ARM_VCPU_EL1_32BIT: Starts the CPU in a 32bit mode. + Depends on KVM_CAP_ARM_EL1_32BIT (arm64 only). + - KVM_ARM_VCPU_PSCI_0_2: Emulate PSCI v0.2 for the CPU. + Depends on KVM_CAP_ARM_PSCI_0_2. -4.78 KVM_GET_REG_LIST +4.83 KVM_ARM_PREFERRED_TARGET Capability: basic -Architectures: arm +Architectures: arm, arm64 +Type: vm ioctl +Parameters: struct struct kvm_vcpu_init (out) +Returns: 0 on success; -1 on error +Errors: + ENODEV: no preferred target available for the host + +This queries KVM for preferred CPU target type which can be emulated +by KVM on underlying host. + +The ioctl returns struct kvm_vcpu_init instance containing information +about preferred CPU target type and recommended features for it. The +kvm_vcpu_init->features bitmap returned will have feature bits set if +the preferred target recommends setting these features, but this is +not mandatory. + +The information returned by this ioctl can be used to prepare an instance +of struct kvm_vcpu_init for KVM_ARM_VCPU_INIT ioctl which will result in +in VCPU matching underlying host. + + +4.84 KVM_GET_REG_LIST + +Capability: basic +Architectures: arm, arm64, mips Type: vcpu ioctl Parameters: struct kvm_reg_list (in/out) Returns: 0 on success; -1 on error @@ -2305,10 +2618,10 @@ KVM_GET_ONE_REG/KVM_SET_ONE_REG calls. -4.80 KVM_ARM_SET_DEVICE_ADDR +4.85 KVM_ARM_SET_DEVICE_ADDR (deprecated) Capability: KVM_CAP_ARM_SET_DEVICE_ADDR -Architectures: arm +Architectures: arm, arm64 Type: vm ioctl Parameters: struct kvm_arm_device_address (in) Returns: 0 on success, -1 on error @@ -2329,20 +2642,25 @@ to know about. The id field is an architecture specific identifier for a specific device. -ARM divides the id field into two parts, a device id and an address type id -specific to the individual device. +ARM/arm64 divides the id field into two parts, a device id and an +address type id specific to the individual device.  bits: | 63 ... 32 | 31 ... 16 | 15 ... 0 | field: | 0x00000000 | device id | addr type id | -ARM currently only require this when using the in-kernel GIC support for the -hardware VGIC features, using KVM_ARM_DEVICE_VGIC_V2 as the device id. When -setting the base address for the guest's mapping of the VGIC virtual CPU -and distributor interface, the ioctl must be called after calling -KVM_CREATE_IRQCHIP, but before calling KVM_RUN on any of the VCPUs. Calling -this ioctl twice for any of the base addresses will return -EEXIST. +ARM/arm64 currently only require this when using the in-kernel GIC +support for the hardware VGIC features, using KVM_ARM_DEVICE_VGIC_V2 +as the device id. When setting the base address for the guest's +mapping of the VGIC virtual CPU and distributor interface, the ioctl +must be called after calling KVM_CREATE_IRQCHIP, but before calling +KVM_RUN on any of the VCPUs. Calling this ioctl twice for any of the +base addresses will return -EEXIST. + +Note, this IOCTL is deprecated and the more flexible SET/GET_DEVICE_ATTR API +should be used instead. -4.82 KVM_PPC_RTAS_DEFINE_TOKEN + +4.86 KVM_PPC_RTAS_DEFINE_TOKEN Capability: KVM_CAP_PPC_RTAS Architectures: ppc @@ -2361,6 +2679,356 @@ calls by the guest for that service will be passed to userspace to be handled. +4.87 KVM_SET_GUEST_DEBUG + +Capability: KVM_CAP_SET_GUEST_DEBUG +Architectures: x86, s390, ppc, arm64 +Type: vcpu ioctl +Parameters: struct kvm_guest_debug (in) +Returns: 0 on success; -1 on error + +struct kvm_guest_debug { + __u32 control; + __u32 pad; + struct kvm_guest_debug_arch arch; +}; + +Set up the processor specific debug registers and configure vcpu for +handling guest debug events. There are two parts to the structure, the +first a control bitfield indicates the type of debug events to handle +when running. Common control bits are: + + - KVM_GUESTDBG_ENABLE: guest debugging is enabled + - KVM_GUESTDBG_SINGLESTEP: the next run should single-step + +The top 16 bits of the control field are architecture specific control +flags which can include the following: + + - KVM_GUESTDBG_USE_SW_BP: using software breakpoints [x86, arm64] + - KVM_GUESTDBG_USE_HW_BP: using hardware breakpoints [x86, s390, arm64] + - KVM_GUESTDBG_INJECT_DB: inject DB type exception [x86] + - KVM_GUESTDBG_INJECT_BP: inject BP type exception [x86] + - KVM_GUESTDBG_EXIT_PENDING: trigger an immediate guest exit [s390] + +For example KVM_GUESTDBG_USE_SW_BP indicates that software breakpoints +are enabled in memory so we need to ensure breakpoint exceptions are +correctly trapped and the KVM run loop exits at the breakpoint and not +running off into the normal guest vector. For KVM_GUESTDBG_USE_HW_BP +we need to ensure the guest vCPUs architecture specific registers are +updated to the correct (supplied) values. + +The second part of the structure is architecture specific and +typically contains a set of debug registers. + +For arm64 the number of debug registers is implementation defined and +can be determined by querying the KVM_CAP_GUEST_DEBUG_HW_BPS and +KVM_CAP_GUEST_DEBUG_HW_WPS capabilities which return a positive number +indicating the number of supported registers. + +When debug events exit the main run loop with the reason +KVM_EXIT_DEBUG with the kvm_debug_exit_arch part of the kvm_run +structure containing architecture specific debug information. + +4.88 KVM_GET_EMULATED_CPUID + +Capability: KVM_CAP_EXT_EMUL_CPUID +Architectures: x86 +Type: system ioctl +Parameters: struct kvm_cpuid2 (in/out) +Returns: 0 on success, -1 on error + +struct kvm_cpuid2 { + __u32 nent; + __u32 flags; + struct kvm_cpuid_entry2 entries[0]; +}; + +The member 'flags' is used for passing flags from userspace. + +#define KVM_CPUID_FLAG_SIGNIFCANT_INDEX BIT(0) +#define KVM_CPUID_FLAG_STATEFUL_FUNC BIT(1) +#define KVM_CPUID_FLAG_STATE_READ_NEXT BIT(2) + +struct kvm_cpuid_entry2 { + __u32 function; + __u32 index; + __u32 flags; + __u32 eax; + __u32 ebx; + __u32 ecx; + __u32 edx; + __u32 padding[3]; +}; + +This ioctl returns x86 cpuid features which are emulated by +kvm.Userspace can use the information returned by this ioctl to query +which features are emulated by kvm instead of being present natively. + +Userspace invokes KVM_GET_EMULATED_CPUID by passing a kvm_cpuid2 +structure with the 'nent' field indicating the number of entries in +the variable-size array 'entries'. If the number of entries is too low +to describe the cpu capabilities, an error (E2BIG) is returned. If the +number is too high, the 'nent' field is adjusted and an error (ENOMEM) +is returned. If the number is just right, the 'nent' field is adjusted +to the number of valid entries in the 'entries' array, which is then +filled. + +The entries returned are the set CPUID bits of the respective features +which kvm emulates, as returned by the CPUID instruction, with unknown +or unsupported feature bits cleared. + +Features like x2apic, for example, may not be present in the host cpu +but are exposed by kvm in KVM_GET_SUPPORTED_CPUID because they can be +emulated efficiently and thus not included here. + +The fields in each entry are defined as follows: + + function: the eax value used to obtain the entry + index: the ecx value used to obtain the entry (for entries that are + affected by ecx) + flags: an OR of zero or more of the following: + KVM_CPUID_FLAG_SIGNIFCANT_INDEX: + if the index field is valid + KVM_CPUID_FLAG_STATEFUL_FUNC: + if cpuid for this function returns different values for successive + invocations; there will be several entries with the same function, + all with this flag set + KVM_CPUID_FLAG_STATE_READ_NEXT: + for KVM_CPUID_FLAG_STATEFUL_FUNC entries, set if this entry is + the first entry to be read by a cpu + eax, ebx, ecx, edx: the values returned by the cpuid instruction for + this function/index combination + +4.89 KVM_S390_MEM_OP + +Capability: KVM_CAP_S390_MEM_OP +Architectures: s390 +Type: vcpu ioctl +Parameters: struct kvm_s390_mem_op (in) +Returns: = 0 on success, + < 0 on generic error (e.g. -EFAULT or -ENOMEM), + > 0 if an exception occurred while walking the page tables + +Read or write data from/to the logical (virtual) memory of a VCPU. + +Parameters are specified via the following structure: + +struct kvm_s390_mem_op { + __u64 gaddr; /* the guest address */ + __u64 flags; /* flags */ + __u32 size; /* amount of bytes */ + __u32 op; /* type of operation */ + __u64 buf; /* buffer in userspace */ + __u8 ar; /* the access register number */ + __u8 reserved[31]; /* should be set to 0 */ +}; + +The type of operation is specified in the "op" field. It is either +KVM_S390_MEMOP_LOGICAL_READ for reading from logical memory space or +KVM_S390_MEMOP_LOGICAL_WRITE for writing to logical memory space. The +KVM_S390_MEMOP_F_CHECK_ONLY flag can be set in the "flags" field to check +whether the corresponding memory access would create an access exception +(without touching the data in the memory at the destination). In case an +access exception occurred while walking the MMU tables of the guest, the +ioctl returns a positive error number to indicate the type of exception. +This exception is also raised directly at the corresponding VCPU if the +flag KVM_S390_MEMOP_F_INJECT_EXCEPTION is set in the "flags" field. + +The start address of the memory region has to be specified in the "gaddr" +field, and the length of the region in the "size" field. "buf" is the buffer +supplied by the userspace application where the read data should be written +to for KVM_S390_MEMOP_LOGICAL_READ, or where the data that should be written +is stored for a KVM_S390_MEMOP_LOGICAL_WRITE. "buf" is unused and can be NULL +when KVM_S390_MEMOP_F_CHECK_ONLY is specified. "ar" designates the access +register number to be used. + +The "reserved" field is meant for future extensions. It is not used by +KVM with the currently defined set of flags. + +4.90 KVM_S390_GET_SKEYS + +Capability: KVM_CAP_S390_SKEYS +Architectures: s390 +Type: vm ioctl +Parameters: struct kvm_s390_skeys +Returns: 0 on success, KVM_S390_GET_KEYS_NONE if guest is not using storage + keys, negative value on error + +This ioctl is used to get guest storage key values on the s390 +architecture. The ioctl takes parameters via the kvm_s390_skeys struct. + +struct kvm_s390_skeys { + __u64 start_gfn; + __u64 count; + __u64 skeydata_addr; + __u32 flags; + __u32 reserved[9]; +}; + +The start_gfn field is the number of the first guest frame whose storage keys +you want to get. + +The count field is the number of consecutive frames (starting from start_gfn) +whose storage keys to get. The count field must be at least 1 and the maximum +allowed value is defined as KVM_S390_SKEYS_ALLOC_MAX. Values outside this range +will cause the ioctl to return -EINVAL. + +The skeydata_addr field is the address to a buffer large enough to hold count +bytes. This buffer will be filled with storage key data by the ioctl. + +4.91 KVM_S390_SET_SKEYS + +Capability: KVM_CAP_S390_SKEYS +Architectures: s390 +Type: vm ioctl +Parameters: struct kvm_s390_skeys +Returns: 0 on success, negative value on error + +This ioctl is used to set guest storage key values on the s390 +architecture. The ioctl takes parameters via the kvm_s390_skeys struct. +See section on KVM_S390_GET_SKEYS for struct definition. + +The start_gfn field is the number of the first guest frame whose storage keys +you want to set. + +The count field is the number of consecutive frames (starting from start_gfn) +whose storage keys to get. The count field must be at least 1 and the maximum +allowed value is defined as KVM_S390_SKEYS_ALLOC_MAX. Values outside this range +will cause the ioctl to return -EINVAL. + +The skeydata_addr field is the address to a buffer containing count bytes of +storage keys. Each byte in the buffer will be set as the storage key for a +single frame starting at start_gfn for count frames. + +Note: If any architecturally invalid key value is found in the given data then +the ioctl will return -EINVAL. + +4.92 KVM_S390_IRQ + +Capability: KVM_CAP_S390_INJECT_IRQ +Architectures: s390 +Type: vcpu ioctl +Parameters: struct kvm_s390_irq (in) +Returns: 0 on success, -1 on error +Errors: + EINVAL: interrupt type is invalid + type is KVM_S390_SIGP_STOP and flag parameter is invalid value + type is KVM_S390_INT_EXTERNAL_CALL and code is bigger + than the maximum of VCPUs + EBUSY: type is KVM_S390_SIGP_SET_PREFIX and vcpu is not stopped + type is KVM_S390_SIGP_STOP and a stop irq is already pending + type is KVM_S390_INT_EXTERNAL_CALL and an external call interrupt + is already pending + +Allows to inject an interrupt to the guest. + +Using struct kvm_s390_irq as a parameter allows +to inject additional payload which is not +possible via KVM_S390_INTERRUPT. + +Interrupt parameters are passed via kvm_s390_irq: + +struct kvm_s390_irq { + __u64 type; + union { + struct kvm_s390_io_info io; + struct kvm_s390_ext_info ext; + struct kvm_s390_pgm_info pgm; + struct kvm_s390_emerg_info emerg; + struct kvm_s390_extcall_info extcall; + struct kvm_s390_prefix_info prefix; + struct kvm_s390_stop_info stop; + struct kvm_s390_mchk_info mchk; + char reserved[64]; + } u; +}; + +type can be one of the following: + +KVM_S390_SIGP_STOP - sigp stop; parameter in .stop +KVM_S390_PROGRAM_INT - program check; parameters in .pgm +KVM_S390_SIGP_SET_PREFIX - sigp set prefix; parameters in .prefix +KVM_S390_RESTART - restart; no parameters +KVM_S390_INT_CLOCK_COMP - clock comparator interrupt; no parameters +KVM_S390_INT_CPU_TIMER - CPU timer interrupt; no parameters +KVM_S390_INT_EMERGENCY - sigp emergency; parameters in .emerg +KVM_S390_INT_EXTERNAL_CALL - sigp external call; parameters in .extcall +KVM_S390_MCHK - machine check interrupt; parameters in .mchk + + +Note that the vcpu ioctl is asynchronous to vcpu execution. + +4.94 KVM_S390_GET_IRQ_STATE + +Capability: KVM_CAP_S390_IRQ_STATE +Architectures: s390 +Type: vcpu ioctl +Parameters: struct kvm_s390_irq_state (out) +Returns: >= number of bytes copied into buffer, + -EINVAL if buffer size is 0, + -ENOBUFS if buffer size is too small to fit all pending interrupts, + -EFAULT if the buffer address was invalid + +This ioctl allows userspace to retrieve the complete state of all currently +pending interrupts in a single buffer. Use cases include migration +and introspection. The parameter structure contains the address of a +userspace buffer and its length: + +struct kvm_s390_irq_state { + __u64 buf; + __u32 flags; + __u32 len; + __u32 reserved[4]; +}; + +Userspace passes in the above struct and for each pending interrupt a +struct kvm_s390_irq is copied to the provided buffer. + +If -ENOBUFS is returned the buffer provided was too small and userspace +may retry with a bigger buffer. + +4.95 KVM_S390_SET_IRQ_STATE + +Capability: KVM_CAP_S390_IRQ_STATE +Architectures: s390 +Type: vcpu ioctl +Parameters: struct kvm_s390_irq_state (in) +Returns: 0 on success, + -EFAULT if the buffer address was invalid, + -EINVAL for an invalid buffer length (see below), + -EBUSY if there were already interrupts pending, + errors occurring when actually injecting the + interrupt. See KVM_S390_IRQ. + +This ioctl allows userspace to set the complete state of all cpu-local +interrupts currently pending for the vcpu. It is intended for restoring +interrupt state after a migration. The input parameter is a userspace buffer +containing a struct kvm_s390_irq_state: + +struct kvm_s390_irq_state { + __u64 buf; + __u32 len; + __u32 pad; +}; + +The userspace memory referenced by buf contains a struct kvm_s390_irq +for each interrupt to be injected into the guest. +If one of the interrupts could not be injected for some reason the +ioctl aborts. + +len must be a multiple of sizeof(struct kvm_s390_irq). It must be > 0 +and it must not exceed (max_vcpus + 32) * sizeof(struct kvm_s390_irq), +which is the maximum number of possibly pending cpu-local interrupts. + +4.90 KVM_SMI + +Capability: KVM_CAP_X86_SMM +Architectures: x86 +Type: vcpu ioctl +Parameters: none +Returns: 0 on success, -1 on error + +Queues an SMI on the thread's vcpu. 5. The kvm_run structure ------------------------ @@ -2397,7 +3065,12 @@ The value of the current interrupt flag. Only valid if in-kernel local APIC is not used. - __u8 padding2[2]; + __u16 flags; + +More architecture-specific flags detailing state of the VCPU that may +affect the device's behavior. The only currently defined flag is +KVM_RUN_X86_SMM, which is valid on x86 machines and is set if the +VCPU is in system management mode. /* in (pre_kvm_run), out (post_kvm_run) */ __u64 cr8; @@ -2454,11 +3127,13 @@ where kvm expects application code to place the data for the next KVM_RUN invocation (KVM_EXIT_IO_IN). Data format is a packed array. + /* KVM_EXIT_DEBUG */ struct { struct kvm_debug_exit_arch arch; } debug; -Unused. +If the exit_reason is KVM_EXIT_DEBUG, then a vcpu is processing a debug event +for which architecture specific information is returned. /* KVM_EXIT_MMIO */ struct { @@ -2473,8 +3148,12 @@ by kvm. The 'data' member contains the written data if 'is_write' is true, and should be filled by application code otherwise. -NOTE: For KVM_EXIT_IO, KVM_EXIT_MMIO, KVM_EXIT_OSI, KVM_EXIT_DCR, - KVM_EXIT_PAPR and KVM_EXIT_EPR the corresponding +The 'data' member contains, in its first 'len' bytes, the value as it would +appear if the VCPU performed a load or store of the appropriate width directly +to the byte array. + +NOTE: For KVM_EXIT_IO, KVM_EXIT_MMIO, KVM_EXIT_OSI, KVM_EXIT_PAPR and + KVM_EXIT_EPR the corresponding operations are complete (and guest state is consistent) only after userspace has re-entered the kernel with KVM_RUN. The kernel side will first finish incomplete operations and then check for pending signals. Userspace @@ -2545,7 +3224,7 @@ __u8 is_write; } dcr; -powerpc specific. +Deprecated - was used for 440 KVM. /* KVM_EXIT_OSI */ struct { @@ -2612,6 +3291,47 @@ external interrupt has just been delivered into the guest. User space should put the acknowledged interrupt vector into the 'epr' field. + /* KVM_EXIT_SYSTEM_EVENT */ + struct { +#define KVM_SYSTEM_EVENT_SHUTDOWN 1 +#define KVM_SYSTEM_EVENT_RESET 2 +#define KVM_SYSTEM_EVENT_CRASH 3 + __u32 type; + __u64 flags; + } system_event; + +If exit_reason is KVM_EXIT_SYSTEM_EVENT then the vcpu has triggered +a system-level event using some architecture specific mechanism (hypercall +or some special instruction). In case of ARM/ARM64, this is triggered using +HVC instruction based PSCI call from the vcpu. The 'type' field describes +the system-level event type. The 'flags' field describes architecture +specific flags for the system-level event. + +Valid values for 'type' are: + KVM_SYSTEM_EVENT_SHUTDOWN -- the guest has requested a shutdown of the + VM. Userspace is not obliged to honour this, and if it does honour + this does not need to destroy the VM synchronously (ie it may call + KVM_RUN again before shutdown finally occurs). + KVM_SYSTEM_EVENT_RESET -- the guest has requested a reset of the VM. + As with SHUTDOWN, userspace can choose to ignore the request, or + to schedule the reset to occur in the future and may call KVM_RUN again. + KVM_SYSTEM_EVENT_CRASH -- the guest crash occurred and the guest + has requested a crash condition maintenance. Userspace can choose + to ignore the request, or to gather VM memory core dump and/or + reset/shutdown of the VM. + + /* KVM_EXIT_IOAPIC_EOI */ + struct { + __u8 vector; + } eoi; + +Indicates that the VCPU's in-kernel local APIC received an EOI for a +level-triggered IOAPIC interrupt. This exit only triggers when the +IOAPIC is implemented in userspace (i.e. KVM_CAP_SPLIT_IRQCHIP is enabled); +the userspace IOAPIC should process the EOI and retrigger the interrupt if +it is still asserted. Vector is the LAPIC interrupt vector for which the +EOI was received. + /* Fix the size of the union. */ char padding[256]; }; @@ -2638,21 +3358,29 @@ and usually define the validity of a groups of registers. (e.g. one bit for general purpose registers) +Please note that the kernel is allowed to use the kvm_run structure as the +primary storage for certain register types. Therefore, the kernel may use the +values in kvm_run even if the corresponding bit in kvm_dirty_regs is not set. + }; -6. Capabilities that can be enabled ------------------------------------ -There are certain capabilities that change the behavior of the virtual CPU when -enabled. To enable them, please see section 4.37. Below you can find a list of -capabilities and what their effect on the vCPU is when enabling them. +6. Capabilities that can be enabled on vCPUs +-------------------------------------------- + +There are certain capabilities that change the behavior of the virtual CPU or +the virtual machine when enabled. To enable them, please see section 4.37. +Below you can find a list of capabilities and what their effect on the vCPU or +the virtual machine is when enabling them. The following information is provided along with the description: Architectures: which instruction set architectures provide this ioctl. x86 includes both i386 and x86_64. + Target: whether this is a per-vcpu or per-vm capability. + Parameters: what parameters are accepted by the capability. Returns: the return value. General error numbers (EBADF, ENOMEM, EINVAL) @@ -2662,6 +3390,7 @@ 6.1 KVM_CAP_PPC_OSI Architectures: ppc +Target: vcpu Parameters: none Returns: 0 on success; -1 on error @@ -2676,6 +3405,7 @@ 6.2 KVM_CAP_PPC_PAPR Architectures: ppc +Target: vcpu Parameters: none Returns: 0 on success; -1 on error @@ -2695,6 +3425,7 @@ 6.3 KVM_CAP_SW_TLB Architectures: ppc +Target: vcpu Parameters: args[0] is the address of a struct kvm_config_tlb Returns: 0 on success; -1 on error @@ -2737,6 +3468,7 @@ 6.4 KVM_CAP_S390_CSS_SUPPORT Architectures: s390 +Target: vcpu Parameters: none Returns: 0 on success; -1 on error @@ -2748,9 +3480,13 @@ When this capability is enabled, KVM_EXIT_S390_TSCH will occur on TEST SUBCHANNEL intercepts. +Note that even though this capability is enabled per-vcpu, the complete +virtual machine is affected. + 6.5 KVM_CAP_PPC_EPR Architectures: ppc +Target: vcpu Parameters: args[0] defines whether the proxy facility is active Returns: 0 on success; -1 on error @@ -2776,7 +3512,177 @@ 6.7 KVM_CAP_IRQ_XICS Architectures: ppc +Target: vcpu Parameters: args[0] is the XICS device fd args[1] is the XICS CPU number (server ID) for this vcpu This capability connects the vcpu to an in-kernel XICS device. + +6.8 KVM_CAP_S390_IRQCHIP + +Architectures: s390 +Target: vm +Parameters: none + +This capability enables the in-kernel irqchip for s390. Please refer to +"4.24 KVM_CREATE_IRQCHIP" for details. + +6.9 KVM_CAP_MIPS_FPU + +Architectures: mips +Target: vcpu +Parameters: args[0] is reserved for future use (should be 0). + +This capability allows the use of the host Floating Point Unit by the guest. It +allows the Config1.FP bit to be set to enable the FPU in the guest. Once this is +done the KVM_REG_MIPS_FPR_* and KVM_REG_MIPS_FCR_* registers can be accessed +(depending on the current guest FPU register mode), and the Status.FR, +Config5.FRE bits are accessible via the KVM API and also from the guest, +depending on them being supported by the FPU. + +6.10 KVM_CAP_MIPS_MSA + +Architectures: mips +Target: vcpu +Parameters: args[0] is reserved for future use (should be 0). + +This capability allows the use of the MIPS SIMD Architecture (MSA) by the guest. +It allows the Config3.MSAP bit to be set to enable the use of MSA by the guest. +Once this is done the KVM_REG_MIPS_VEC_* and KVM_REG_MIPS_MSA_* registers can be +accessed, and the Config5.MSAEn bit is accessible via the KVM API and also from +the guest. + +7. Capabilities that can be enabled on VMs +------------------------------------------ + +There are certain capabilities that change the behavior of the virtual +machine when enabled. To enable them, please see section 4.37. Below +you can find a list of capabilities and what their effect on the VM +is when enabling them. + +The following information is provided along with the description: + + Architectures: which instruction set architectures provide this ioctl. + x86 includes both i386 and x86_64. + + Parameters: what parameters are accepted by the capability. + + Returns: the return value. General error numbers (EBADF, ENOMEM, EINVAL) + are not detailed, but errors with specific meanings are. + + +7.1 KVM_CAP_PPC_ENABLE_HCALL + +Architectures: ppc +Parameters: args[0] is the sPAPR hcall number + args[1] is 0 to disable, 1 to enable in-kernel handling + +This capability controls whether individual sPAPR hypercalls (hcalls) +get handled by the kernel or not. Enabling or disabling in-kernel +handling of an hcall is effective across the VM. On creation, an +initial set of hcalls are enabled for in-kernel handling, which +consists of those hcalls for which in-kernel handlers were implemented +before this capability was implemented. If disabled, the kernel will +not to attempt to handle the hcall, but will always exit to userspace +to handle it. Note that it may not make sense to enable some and +disable others of a group of related hcalls, but KVM does not prevent +userspace from doing that. + +If the hcall number specified is not one that has an in-kernel +implementation, the KVM_ENABLE_CAP ioctl will fail with an EINVAL +error. + +7.2 KVM_CAP_S390_USER_SIGP + +Architectures: s390 +Parameters: none + +This capability controls which SIGP orders will be handled completely in user +space. With this capability enabled, all fast orders will be handled completely +in the kernel: +- SENSE +- SENSE RUNNING +- EXTERNAL CALL +- EMERGENCY SIGNAL +- CONDITIONAL EMERGENCY SIGNAL + +All other orders will be handled completely in user space. + +Only privileged operation exceptions will be checked for in the kernel (or even +in the hardware prior to interception). If this capability is not enabled, the +old way of handling SIGP orders is used (partially in kernel and user space). + +7.3 KVM_CAP_S390_VECTOR_REGISTERS + +Architectures: s390 +Parameters: none +Returns: 0 on success, negative value on error + +Allows use of the vector registers introduced with z13 processor, and +provides for the synchronization between host and user space. Will +return -EINVAL if the machine does not support vectors. + +7.4 KVM_CAP_S390_USER_STSI + +Architectures: s390 +Parameters: none + +This capability allows post-handlers for the STSI instruction. After +initial handling in the kernel, KVM exits to user space with +KVM_EXIT_S390_STSI to allow user space to insert further data. + +Before exiting to userspace, kvm handlers should fill in s390_stsi field of +vcpu->run: +struct { + __u64 addr; + __u8 ar; + __u8 reserved; + __u8 fc; + __u8 sel1; + __u16 sel2; +} s390_stsi; + +@addr - guest address of STSI SYSIB +@fc - function code +@sel1 - selector 1 +@sel2 - selector 2 +@ar - access register number + +KVM handlers should exit to userspace with rc = -EREMOTE. + +7.5 KVM_CAP_SPLIT_IRQCHIP + +Architectures: x86 +Parameters: args[0] - number of routes reserved for userspace IOAPICs +Returns: 0 on success, -1 on error + +Create a local apic for each processor in the kernel. This can be used +instead of KVM_CREATE_IRQCHIP if the userspace VMM wishes to emulate the +IOAPIC and PIC (and also the PIT, even though this has to be enabled +separately). + +This capability also enables in kernel routing of interrupt requests; +when KVM_CAP_SPLIT_IRQCHIP only routes of KVM_IRQ_ROUTING_MSI type are +used in the IRQ routing table. The first args[0] MSI routes are reserved +for the IOAPIC pins. Whenever the LAPIC receives an EOI for these routes, +a KVM_EXIT_IOAPIC_EOI vmexit will be reported to userspace. + +Fails if VCPU has already been created, or if the irqchip is already in the +kernel (i.e. KVM_CREATE_IRQCHIP has already been called). + + +8. Other capabilities. +---------------------- + +This section lists capabilities that give information about other +features of the KVM implementation. + +8.1 KVM_CAP_PPC_HWRNG + +Architectures: ppc + +This capability, if KVM_CHECK_EXTENSION indicates that it is +available, means that that the kernel has an implementation of the +H_RANDOM hypercall backed by a hardware random-number generator. +If present, the kernel H_RANDOM handler can be enabled for guest use +with the KVM_CAP_PPC_ENABLE_HCALL capability.