Mailing List Archive

[BUG] panic: "IO-APIC + timer doesn't work" - several people have reproduced
Hello all,

After attempting to install QubesOS on a new laptop, I've stumbled upon
a group of people with an assortment of laptops but all the same
problem: a Xen panic stating "IO-APIC + timer doesn't work!"

Many of us are on different stages of debugging this, so I'll cite all
our efforts here.

# Affected Xen versions
- 4.0.5-14.fc25
- 4.13.0
- probably other versions

# Affected hardware

- Dell XPS 7390 13" (i7-10710U) [1] [2] [3] [4]
- Dell XPS 7390 2-in-1 [1]
- Surface Laptop 3 Business Edition (i7-1065G7) [5] [6]
- Lenovo ThinkBook 13s (i7-8565U) [7]
- Mini-PcBarebone [8] [9]

[1]: https://www.reddit.com/r/Qubes/comments/edqrab/qubes_and_ice_lake/

[2]: https://www.reddit.com/r/Qubes/comments/dfv6jx/panic_on_cpu_0_ioapic_timer_doesnt_work_on_ice/
[3]: https://groups.google.com/forum/#!topic/qubes-users/W8mX-07xNZU
[4]: https://lists.xenproject.org/archives/html/xen-users/2019-12/msg00031.html

[5]: https://lists.xenproject.org/archives/html/xen-users/2019-12/msg00017.html
[6]: https://groups.google.com/forum/#!topic/qubes-users/4iswU7cfJHY

[7]: https://www.reddit.com/r/Qubes/comments/clk0eu/help_install_qubes_4_on_lenovo_thinkbook_13s/

[8]: https://groups.google.com/forum/#!topic/qubes-users/PIyz7BEV1mg
[9]: https://archive.is/RuiAD

# Excerpts from boot logs

Qubes on my XPS 7390 13"

(XEN) ..TIMER: vector=0xF0 apic1=0 pin1=2 apic2=-1 pin2=-1
(XEN) ..MP-BIOS bug: 8254 timer not connected to IO-APIC
(XEN) ...trying to set up timer (IRQ0) through 8259A ,,,
(XEN) ...trying to set up timer as Virtual Wire IRQ... failed.
(XEN) ...trying to set up timer as ExtINT IRQ...spurious 8259A interrupt: IRQ7.
(XEN) CPU0: no irq handler for vector e7 (IRQ -8)
(XEN) IRQ7 a=0001[0001,0000] v=60[ffffffff] t=IO-APIC-edge s=00000002
(XEN) failed :(.
(XEN)
(XEN) ***************************************
(XEN) Panic on CPU 0:
(XEN) IO-APIC + timer doesn't work! Boot with apic_verbosity=debug and send a report. Then try booting with the 'noapic' option
(XEN) ***************************************

Ubuntu (w/out Xen) on my XPS 7390 13"
(full logs: https://pastebin.com/SdRg87F8 https://pastebin.com/E3zCfb35)

[ 0.000000] microcode: microcode updated early to revision 0xca, date = 2019-10-03
[ 0.000000] Linux version 5.3.0-24-generic (buildd@lgw01-amd64-035) (gcc version 9.2.1 20191008 (Ubuntu 9.2.1-9ubuntu2)) #26-Ubuntu SMP Thu Nov 14 01:33:18 UTC 2019 (Ubuntu 5.3.0-24.26-generic 5.3.10)
[ 0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-5.3.0-24-generic root=UUID=062d0c69-31dc-4c7f-9915-731505fea81b ro quiet splash apic=debug vt.handoff=7
[ .... ]
[ 0.188468] x2apic enabled
[ 0.188492] Switched APIC routing to cluster x2apic.
[ 0.188493] masked ExtINT on CPU#0
[ 0.195266] ENABLING IO-APIC IRQs
[ 0.195267] init IO_APIC IRQs
[ .... ]
[ 0.195633] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
[ 0.213854] clocksource: tsc-early: mask: 0xffffffffffffffff max_cycles: 0x170fff30cc4, max_idle_ns: 440795237869 ns
[ 0.213857] Calibrating delay loop (skipped), value calculated using timer frequency.. 3199.92 BogoMIPS (lpj=6399840)
[ 0.213859] pid_max: default: 32768 minimum: 301
[ 0.215117] LSM: Security Framework initializing
[ 0.215124] Yama: becoming mindful.
[ 0.215172] AppArmor: AppArmor initialized

I'd like to note that Ubuntu, unlike Qubes, doesn't need to try
any `MP-BIOS bug` fallbacks.

# Relevant source code

Xen: https://github.com/xen-project/xen/blob/0cd791c499bdc698d14a24050ec56d60b45732e0/xen/arch/x86/io_apic.c#L1923-L1933
Linux: https://github.com/torvalds/linux/blob/fd6988496e79a6a4bdb514a4655d2920209eb85d/arch/x86/kernel/apic/io_apic.c#L2185-L2211

# Things that have been tried

Disabling APIC entirely (`noapic x2apic=off`)
- This is avoiding the problem, not fixing it
- QubeOS requires APIC anyway, so this is not an option for many of us

Switching the timer to HPET (via the `clocksource` flag)
- This didn't fix the panic (I've tried `acpi` and `pit`)
- On my XPS 13, this doesn't change any of the timer error output
- Ubuntu works on my laptop using HPET
- http://xenbits.xen.org/docs/unstable/misc/xen-command-line.html#clocksource-x86

Updating to 5.4 Linux kernel [77]
- This didn't fix the panic
- https://www.reddit.com/r/Qubes/comments/edqrab/qubes_and_ice_lake/fcak799/

Reproducing the problem on Ubuntu
- I'm able to reproduce Xen crashing, but I'm unable to enable verbose
logging
- https://lists.xenproject.org/archives/html/xen-users/2019-12/msg00031.html

# More verbose boot logs

I had to type these up by hand so please excuse the lack of detail.

(XEN) [...]
(XEN) CPU0: No irq handler for vector 40 (IRC -2147483648, LAPIC)
(XEN) PCI: MCFG configuration 0: base e0000000 segment 0000 buses 00 - ff
(XEN) PCI: Not using MCFG for segment 0000 bus 00-ff
(XEN) Intel VT-d iommu 0 supported page sizes: 4kB, 2MB, 1GB.
(XEN) Intel VT-d iommu 1 supported page sizes: 4kB, 2MB, 1GB.
(XEN) Intel VT-d Snoop Control not enabled.
(XEN) Intel VT-d Dom0 DMA Passthrough not enabled.
(XEN) Intel VT-d Queued Invalidation enabled.
(XEN) Intel VT-d Posted Interrupt not enabled.
(XEN) Intel VT-d Shared EPT tables enabled.
(XEN) I/O virtualization enabled.
(XEN) - Dom0 mode: Relaxed

(XEN) Interrupt remapping enabled
(XEN) nr_sockets: 2
(XEN) Getting VERSION: 1060015
(XEN) Getting VERSION: 1060015

(XEN) Enabled directed EOI with ioapic_ack_old on!
(XEN) Getting ID: 0
(XEN) Getting LVT0: 700
(XEN) Getting LVT1: 400
(XEN) Suppress EOI broadcast on CPU#0
(XEN) enabled ExtINT on CPU#0

(XEN) ENABLING IO_APIC IRQs
(XEN) -> Using old ACK method
(XEN) init IO_APIC IRQs
(XEN) IO-APIC (apicid-pin) 2-0, 2-16, 2-17, 2-18, 2-19, 2-20, 2-21, 2-22, 2-23, 2-24, 2-25, [...], 2-115, 2-116, 2-117, 2-118, 2-119 not connected.

(XEN) ..TIMER: vector=0xF0 apic1=0 pin1=2 apic2=-1 pin2=-1
(XEN) ..MP-BIOS bug: 8254 timer not connected to IO-APIC
(XEN) ...trying to set up timer (IRQ0) through 8259A ,,,
(XEN) ...trying to set up timer as Virtual Wire IRQ... failed.
(XEN) ...trying to set up timer as ExtINT IRQ...spurious 8259A interrupt: IRQ7.
(XEN) CPU0: no irq handler for vector e7 (IRQ -8)
(XEN) IRQ7 a=0001[0001,0000] v=60[ffffffff] t=IO-APIC-edge s=00000002
(XEN) failed :(.
(XEN)
(XEN) ***************************************
(XEN) Panic on CPU 0:
(XEN) IO-APIC + timer doesn't work! Boot with apic_verbosity=debug and send a report. Then try booting with the 'noapic' option
(XEN) ***************************************
(XEN)
(XEN) Reboot in five seconds...


Cheers,
Aaron Janse

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [BUG] panic: "IO-APIC + timer doesn't work" - several people have reproduced [ In reply to ]
On 31/12/2019 07:52, Aaron Janse wrote:
> Hello all,
>
> After attempting to install QubesOS on a new laptop, I've stumbled upon
> a group of people with an assortment of laptops but all the same
> problem: a Xen panic stating "IO-APIC + timer doesn't work!"
>
> Many of us are on different stages of debugging this, so I'll cite all
> our efforts here.
>

<snip>

> # Excerpts from boot logs
>
> Qubes on my XPS 7390 13"
>
> (XEN) ..TIMER: vector=0xF0 apic1=0 pin1=2 apic2=-1 pin2=-1
> (XEN) ..MP-BIOS bug: 8254 timer not connected to IO-APIC
> (XEN) ...trying to set up timer (IRQ0) through 8259A ,,,
> (XEN) ...trying to set up timer as Virtual Wire IRQ... failed.
> (XEN) ...trying to set up timer as ExtINT IRQ...spurious 8259A interrupt: IRQ7.
> (XEN) CPU0: no irq handler for vector e7 (IRQ -8)
> (XEN) IRQ7 a=0001[0001,0000] v=60[ffffffff] t=IO-APIC-edge s=00000002
> (XEN) failed :(.
> (XEN)
> (XEN) ***************************************
> (XEN) Panic on CPU 0:
> (XEN) IO-APIC + timer doesn't work! Boot with apic_verbosity=debug and send a report. Then try booting with the 'noapic' option
> (XEN) ***************************************

Is there any full boot log in the bad case?  Debugging via divination
isn't an effective way to get things done.

(Irrespective, I'm pretty sure this is a Grub2+EFI issue failing to pass
the ACPI tables to Xen, and this eventual panic is just cascade fallout.)

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [BUG] panic: "IO-APIC + timer doesn't work" - several people have reproduced [ In reply to ]
On 31.12.2019 08:52, Aaron Janse wrote:
> I'd like to note that Ubuntu, unlike Qubes, doesn't need to try
> any `MP-BIOS bug` fallbacks.

"Doesn't need to try" is supposed to mean what? That it gets past
the timer interrupt initialization, meaning if it crashes another
way, it's a different problem? Or instead meaning it works
(contrary to information found elsewhere), suggesting there's a
Qubes side change involved?

> # Things that have been tried
>
> Disabling APIC entirely (`noapic x2apic=off`)
> - This is avoiding the problem, not fixing it
> - QubeOS requires APIC anyway, so this is not an option for many of us
>
> Switching the timer to HPET (via the `clocksource` flag)
> - This didn't fix the panic (I've tried `acpi` and `pit`)
> - On my XPS 13, this doesn't change any of the timer error output
> - Ubuntu works on my laptop using HPET
> - http://xenbits.xen.org/docs/unstable/misc/xen-command-line.html#clocksource-x86

Did you try disabling use of the IOMMU ("iommu=0" on the Xen
command line)?

> Updating to 5.4 Linux kernel [77]
> - This didn't fix the panic

As long as you don't even reach Dom0 initialization, no change
whatsoever to the Dom0 kernel will possibly help.

If this is as common a problem as you say, it's hard to believe
this has never worked on any of these systems. Hence it would be
helpful to know starting from which version this has been
regressed.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [BUG] panic: "IO-APIC + timer doesn't work" - several people have reproduced [ In reply to ]
On Fri, Jan 3, 2020, at 4:51 AM, Jan Beulich wrote:
> On 31.12.2019 08:52, Aaron Janse wrote:
> > I'd like to note that Ubuntu, unlike Qubes, doesn't need to try
> > any `MP-BIOS bug` fallbacks.
>
> "Doesn't need to try" is supposed to mean what? That it gets past
> the timer interrupt initialization, meaning if it crashes another
> way, it's a different problem? Or instead meaning it works
> (contrary to information found elsewhere), suggesting there's a
> Qubes side change involved?

I originally thought that the problem was the timer specifically, but
based on what you and Andrew Cooper have said, it sounds like the root
cause is somewhere else.

Andrew Cooper wrote:
> (Irrespective, I'm pretty sure this is a Grub2+EFI issue failing to pass
> the ACPI tables to Xen, and this eventual panic is just cascade fallout.)

I tried to get Xen working via legacy boot, but I haven't been able to get
my laptop to boot anything but UEFI. The BIOS even states "UEFI only."

> Did you try disabling use of the IOMMU ("iommu=0" on the Xen
> command line)?

Unfortunately, Qubes requires iommu. Setting "iommu=0" results in a panic:

```
Couldn't enable IOMMU and iommu=required/force
```

I also (unsuccessfully) tried iommu=no-igfx and iommu=soft (both resulted
in the timer panic).

I couldn't find anywhere to disable the flag (even though it would break
Qubes, at least the flag could help minimize the scope of the cause of the
timer crash).

I installed Xen on Arch Linux in order to test this flag, but I'm having
the same problem I had on Ubuntu: booting to Xen hangs on loading
initramfs. [1]

> If this is as common a problem as you say, it's hard to believe
> this has never worked on any of these systems. Hence it would be
> helpful to know starting from which version this has been
> regressed.

That makes sense. I've tried to reproduce the problem on both Arch and
Ubuntu (both hang, and I'm not sure why or how to debug that). Because
Qubes is the only OS I've been able to boot verbose Xen from, I installed
a 2016 release to try out. However, I couldn't get past a hang showing
the Dell logo. I had this same issue on NixOS, described by someone else
with the same laptop on the NixOS Discourse [4]. The solution for NixOS
was to use a newer version of the distro.

If I can get past the boot hang on Ubuntu or Arch, I'd be happy to go about
bisecting the issue, comparing Ubuntu/Arch vs Qubes, compiling with new
printf statements, etc.

As a side note, the XPS 7390 2-in-1 user was able to get Xen to boot
using the acpi=noirq flag [2]. My understanding is that needing this flag
indicates that something's still wrong [3].

[1] https://lists.xenproject.org/archives/html/xen-users/2019-12/msg00031.html
[2] https://www.reddit.com/r/Qubes/comments/edqrab/qubes_and_ice_lake/fcresld/
[3] http://xenbits.xen.org/docs/unstable/misc/xen-command-line.html#acpi
[4] https://discourse.nixos.org/t/nixos-stable-wont-boot-from-usb-on-xps-7390/4776

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [BUG] panic: "IO-APIC + timer doesn't work" - several people have reproduced [ In reply to ]
On 06.01.2020 01:35, Aaron Janse wrote:
> On Fri, Jan 3, 2020, at 4:51 AM, Jan Beulich wrote:
>> Did you try disabling use of the IOMMU ("iommu=0" on the Xen
>> command line)?
>
> Unfortunately, Qubes requires iommu. Setting "iommu=0" results in a panic:
>
> ```
> Couldn't enable IOMMU and iommu=required/force
> ```

Since this isn't the upstream default, there are two options: Either
you simply have e.g. "iommu=force" elsewhere on the command line - in
this case simply delete the option for this experimenting. Or they've
patched their sources to this effect, in which case you'll want to
undo that source change.

> I couldn't find anywhere to disable the flag (even though it would break
> Qubes, at least the flag could help minimize the scope of the cause of the
> timer crash).
>
> I installed Xen on Arch Linux in order to test this flag, but I'm having
> the same problem I had on Ubuntu: booting to Xen hangs on loading
> initramfs. [1]

Booting which exact version of Xen? Iirc these initramfs issues
(with LZ4 compression) have been fixed on 4.13.0 as well as
4.12.2. Also you may not have realized that _any_ initramfs (or
Dom0 kernel image) issue could be avoided for the purposes here
by simply omitting them from the directives issued to grub -
after all you don't get as far as booting Dom0.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [BUG] panic: "IO-APIC + timer doesn't work" - several people have reproduced [ In reply to ]
enabling vecOn Tue, Dec 31, 2019 at 5:43 AM Aaron Janse <aaron@ajanse.me> wrote:
>
> On Tue, Dec 31, 2019, at 12:27 AM, Andrew Cooper wrote:
> > Is there any full boot log in the bad case? Debugging via divination
> > isn't an effective way to get things done.
>
> Agreed. I included some more verbose logs towards the end of the email (typed up by hand).
>
> Attached are pictures from a slow-motion video of my laptop booting. Note that I also included a picture of a stack trace that happens immediately before reboot. It doesn't look related, but I wanted to include it anyway.
>
> I think the original email should have said "4.8.5" instead of "4.0.5." Regardless, everyone on this mailing list can now see all the boot logs that I've seen.
>
> Attaching a serial console seems like it would be difficult to do on this laptop, otherwise I would have sent the logs as a txt file.

I'm seeing Xen panic: "IO-APIC + timer doesn't work" on a Dell
Latitude 7200 2-in-1. Fedora 31 Live USB image boots successfully.
No way to get serial output. I manually recreated the output before
from the vga display.

Comparing the Linux and Xen, Xen does:
(XEN) I/O Virtualisation enabled
(XEN) - Dom0 mode: Relaxed
(XEN) Interrupt remapping enabled
(XEN) nr_sockets: 1
(XEN) Getting VERSION: 1060015
(XEN) Getting VERSION: 1060015
(XEN) Enabled directed EOI with ioapic_ack_old on!
(XEN) Getting ID: 0
(XEN) Getting LVT0: 700
(XEN) Getting LVT1: 400
(XEN) Suppress EOI broadcast on CPU#0
(XEN) enabled ExtINT on CPU#0
(XEN) ESR value before enabling vector: 0x40 after: 0
(XEN) ENABLING IO-APIC IRQs
(XEN) -> Using old ACK method
(XEN) init IO_APIC IRQs
(XEN) IO-APIC (apicid-pin) 2-0, 2-16, 2-17, ...<snip>... 2-119 not connected.
(XEN) ..TIMER: vector=0xF0 apic1=0 pin1=2 apic2=-1 pin2=-1
(XEN) ..MP-BIOS bug: 8254 timer not connected to IO-APIC
(XEN) ...trying to set up timer (IRQ0) through 8259A ... failed
(XEN) ...trying to set up timer as Virtual Wire IRQ... failed.
(XEN) ...trying to set up timer as ExtINT IRQ...spurious 8259A interrupt: IRQ7.
(XEN) CPU0: no irq handler for vector e7 (IRQ -8)
(XEN) IRQ7 a=0001[0001,0000] v=60[ffffffff] t=IO-APIC-edge s=00000002
(XEN) failed :(.

while linux apic=debug does:
kernel: ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
kernel: clocksource: tsc-early: mask: 0xffffffffffffffff max_cycles:
0x1e44fb6c2ab, max_idle_ns: 440795206594 ns
kernel: Calibrating delay loop (skipped), value calculated using timer
frequency.. 4199.88 BogoMIPS (lpj=2099944)
.. and continues onward

Since linux doesn't print "...trying to set up timer (IRQ0) through
the 8259A ..." that seems to indicate Linux is seeing the timer
interrupt properly.
https://elixir.bootlin.com/linux/v5.3.7/source/arch/x86/kernel/apic/io_apic.c#L2198

I tested Linux with intel_iommu=on and that booted successfully.
Under Xen, this system sets iommu_x2apic_enabled = true, so
force_iommu is set and iommu=0 cannot disable the iommu.
fails. Oh, I can disable x2apic and then disable iommu

x2apic=1 -> failure above
x2apic=0 iommu=0 -> failure above
clocksource=acpi -> doesn't help
clocksource=pit -> hangs after "load tracking window length 1073741824 ns"
noapic -> BUG in init_bsp_APIC

One other thing that might be noteworthy. Linux only prints ACPI IRQ0
and IRQ9 used by override where Xen lists IRQ 0, 2 & 9.

Below is the re-constructed Xen console output. The SMBIOS line is
the first thing displayed on the VGA output. I skipped the full EFI
memory map dump since it is quite long.

I've also attached the Linux dmesg output. Any pointers or
suggestions are most welcome.

Thanks,
Jason

(XEN) SMBIOS 3.2 present.
(XEN) APIC boot stats is `xapic`
(XEN) Using APIC driver default
(XEN) XSM Framework v1.0.0 initialized
(XEN) Flask: 128 avtab hash slots, 283 rules.
(XEN) Flask: 128 avtab hash slots, 283 rules.
(XEN) Flask: 4 users, 3 roles, 38 types, 2 bools
(XEN) Flask: 13 classes, 283 rules
(XEN) Flask: Starting in enforcing mode.
(XEN) ACPI: PM-Timer IO Port: 0x1808 (32 bits)
(XEN) ACPI: v5 SLEEP INFO: control[1:1804], status[1:1800]
(XEN) ACPI: Invalid sleep control/status register data: 0:0x8:0x3 0:0x8:0x3
(XEN) ACPI: SLEEP INFO: pm1x_cnt[1:1804,1:0], pm1x_evt[1:1800,1:0]
(XEN) ACPI: 32/64X FACS address mismatch in FADT -
38c80c00/0000000000000000, using 32
(XEN) ACPI: wakeup_vec[38c80c0c], vec_size[20]
(XEN) ACPI: Local APIC address 0xfee00000
(XEN) ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled)
(XEN) ACPI: LAPIC (acpi_id[0x02] lapic_id[0x02] enabled)
(XEN) ACPI: LAPIC (acpi_id[0x03] lapic_id[0x04] enabled)
(XEN) ACPI: LAPIC (acpi_id[0x04] lapic_id[0x06] enabled)
(XEN) ACPI: LAPIC (acpi_id[0x05] lapic_id[0x01] enabled)
(XEN) ACPI: LAPIC (acpi_id[0x06] lapic_id[0x03] enabled)
(XEN) ACPI: LAPIC (acpi_id[0x07] lapic_id[0x05] enabled)
(XEN) ACPI: LAPIC (acpi_id[0x08] lapic_id[0x07] enabled)
(XEN) ACPI: LAPIC_NMI (acpi_id[0x01] high edge lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x02] high edge lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x03] high edge lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x04] high edge lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x05] high edge lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x06] high edge lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x07] high edge lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x08] high edge lint[0x1])
(XEN) ACPI: IOAPIC (id[0x02] address[0xfec00000] gsi_base[0])
(XEN) IOAPIC[0]: apic_id 2, version 32, address 0xfec00000, GSI 0-119
(XEN) ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
(XEN) ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
(XEN) ACPI: IRQ0 used by override.
(XEN) ACPI: IRQ2 used by override.
(XEN) ACPI: IRQ9 used by override.
(XEN) Enabling APIC mode: Flat. Using 1 I/O APICs
(XEN) ACPI: HPET id: 0x8086a201 base: 0xfed00000
(XEN) PCI: MCFG configuration 0: base e0000000 segment 0000 buses 00 - ff
(XEN) PCI: MCFG area at e0000000 reserved in E820
(XEN) PCI: Using MCFG for segment 0000 bus 00-ff
(XEN) [VT-D]dmar.c:563: Non-existent device (0000:00:16.7) is
reported in RMRR (386fa000, 38779fff)'s scope!
(XEN) [VT-D]dmar.c:579: Ignore the RMRR (386fa000, 38779fff) due to
device under its scope are not PCI discoverable
(XEN) ACPI: BGRT: invalidating v1 image at 0x3329c018
(XEN) Using ACPI (MADT) for SMP configuration information
(XEN) SMP: Allowing 8 CPUs (0 hotplug CPUs)
(XEN) mapped APIC to ffff82cfffffb000 (fee00000)
(XEN) mapped IOAPIC to ffff82cfffffa000 (fec00000)
(XEN) IRQ limits: 120 GSI, 1544 MSI/MSI-X
(XEN) Switched to APIC driver x2apic_cluster
(XEN) xstate: size: 0x440 and states: 0x1f
(XEN) mce_intel.c:773: MCA Capability: firstbank 0, extended MCE MSR
0, BCAST, CMCI
(XEN) CPU0: Intel machine check reporting enabled
(XEN) Speculative mitigation facilities:
(XEN) Hardware features: IBRS/IBPB STIBP L1D_FLUSH SSBD MD_CLEAR
IBRS_ALL RDCL_NO SKIP_L1DFL MDS_NO
(XEN) Compiled-in support: INDIRECT_THUNK SHADOW_PAGING
(XEN) Xen settings: BTI-Thunk JMP, SPEC_CTRL: IBRS+ SSBD-, Other:
IBPB BRANCH_HARDEN
(XEN) L1TF: believed vulnerable, maxphysaddr L1D 46, CPUID 39, Safe
address 8000000000
(XEN) Support for HVM VMs: MSR_SPEC_CTRL RSB EAGER_FPU MD_CLEAR
(XEN) Support for PV VMs: MSR_SPEC_CTRL RSB EAGER_FPU MD_CLEAR
(XEN) XPTI (64-bit PV only): Dom0 disabled, DomU enabled (with PCID)
(XEN) PV L1TF shadowing: Dom0 disabled, DomU disabled
(XEN) Using scheduler: SMP Credit Scheduler rev2 (credit2)
(XEN) Initializing Credit2 scheduler
(XEN) load_precision_shift: 18
(XEN) load_window_shift: 30
(XEN) underload_balance_tolerance: 0
(XEN) overload_balance_tolerance: -3
(XEN) runqueues arrangement: socket
(XEN) cap enforcement granularity: 10ms
(XEN) load tracking window length 1073741824 ns
(XEN) Adding cpu 0 to runqueue 0
(XEN) First cpu on runqueue, activating
(XEN) Platform timer is 23.999MHz HPET
(XEN) Detected 2111.997 MHz processor.
(XEN) EFI memory map:
(XEN) 0000000000000-0000000003fff type=2 attr=000000000000000f
(XEN) 0000000004000-000000008dfff type=7 attr=000000000000000f
(XEN) 000000008e000-000000009dfff type=2 attr=000000000000000f
(XEN) 000000009e000-000000009efff type=0 attr=000000000000000f
(XEN) 000000009f000-000000009ffff type=3 attr=000000000000000f
<snip>
(XEN) 0000037d0c000-000038779ffff type=0 attr=000000000000000f
(XEN) 000003877a000-0000387f6ffff type=9 attr=000000000000000f
(XEN) 00000387f7000-000038c81ffff type=10 attr=000000000000000f
<snip>
(XEN) 00000489f4000-00000489fffff type=7 attr=000000000000000f
(XEN) 0000100000000-00004ac7fffff type=7 attr=000000000000000f
(XEN) 00000000a0000-00000000fffff type=0 attr=0000000000000000
(XEN) 0000048a00000-000004f7fffff type=0 attr=0000000000000000
(XEN) 00000e0000000-00000efffffff type=11 attr=800000000000100d
(XEN) 00000fe000000-00000fe010fff type=11 attr=8000000000000001
(XEN) 00000fec00000-00000fec00fff type=11 attr=8000000000000001
(XEN) 00000fed20000-00000fed7ffff type=0 attr=0000000000000000
(XEN) 00000fee00000-00000fee00fff type=11 attr=8000000000000001
(XEN) 00000ff000000-00000ffffffff type=11 attr=800000000000100d
(XEN) alt table ffff82d080483030 -> ffff82d0804910d8
(XEN) Intel VT-d iommu 0 supported page sizes: 4kB, 2MB, 1GB.
(XEN) Intel VT-d iommu 1 supported page sizes: 4kB, 2MB, 1GB.
(XEN) Intel VT-d Snoop Control not enabled.
(XEN) Intel VT-d Dom0 DMA Passthrough not enabled.
(XEN) Intel VT-d Queued Invalidation enabled.
(XEN) Intel VT-d Interrupt Remapping enabled.
(XEN) Intel VT-d Posted Interrupt not enabled.
(XEN) Intel VT-d Shared EPT tables enabled.
(XEN) I/O virtualisation enabled
(XEN) - Dom0 mode: Relaxed
(XEN) Interrupt remapping enabled
(XEN) nr_sockets: 1
(XEN) Getting VERSION: 1060015
(XEN) Getting VERSION: 1060015
(XEN) Enabled directed EOI with ioapic_ack_old on!
(XEN) Getting ID: 0
(XEN) Getting LVT0: 700
(XEN) Getting LVT1: 400
(XEN) Suppress EOI broadcast on CPU#0
(XEN) enabled ExtINT on CPU#0
(XEN) ESR value before enabling vector: 0x40 after: 0
(XEN) ENABLING IO-APIC IRQs
(XEN) -> Using old ACK method
(XEN) init IO_APIC IRQs
(XEN) IO-APIC (apicid-pin) 2-0, 2-16, 2-17, ...<snip>... 2-119 not connected.
(XEN) ..TIMER: vector=0xF0 apic1=0 pin1=2 apic2=-1 pin2=-1
(XEN) ..MP-BIOS bug: 8254 timer not connected to IO-APIC
(XEN) ...trying to set up timer (IRQ0) through 8259A ... failed.
(XEN) ...trying to set up timer as Virtual Wire IRQ... failed.
(XEN) ...trying to set up timer as ExtINT IRQ...spurious 8259A interrupt: IRQ7.
(XEN) CPU0: no irq handler for vector 27 (IRQ -8)
(XEN) IRQ7 a=ffffffffffffffff[0001,0000] v=68[ffffffff] t=IO-APIC-edge
s=00000002
(XEN) failed :(.
Re: [BUG] panic: "IO-APIC + timer doesn't work" - several people have reproduced [ In reply to ]
On 17/02/2020 19:19, Jason Andryuk wrote:
> enabling vecOn Tue, Dec 31, 2019 at 5:43 AM Aaron Janse <aaron@ajanse.me> wrote:
>> On Tue, Dec 31, 2019, at 12:27 AM, Andrew Cooper wrote:
>>> Is there any full boot log in the bad case? Debugging via divination
>>> isn't an effective way to get things done.
>> Agreed. I included some more verbose logs towards the end of the email (typed up by hand).
>>
>> Attached are pictures from a slow-motion video of my laptop booting. Note that I also included a picture of a stack trace that happens immediately before reboot. It doesn't look related, but I wanted to include it anyway.
>>
>> I think the original email should have said "4.8.5" instead of "4.0.5." Regardless, everyone on this mailing list can now see all the boot logs that I've seen.
>>
>> Attaching a serial console seems like it would be difficult to do on this laptop, otherwise I would have sent the logs as a txt file.
> I'm seeing Xen panic: "IO-APIC + timer doesn't work" on a Dell
> Latitude 7200 2-in-1. Fedora 31 Live USB image boots successfully.
> No way to get serial output. I manually recreated the output before
> from the vga display.

We have multiple bugs.

First and foremost, Xen seems totally broken when running in ExtINT
mode.  This needs addressing, and ought to be sufficient to let Xen
boot, at which point we can try to figure out why it is trying to fall
back into 486(ish) compatibility mode.

> I tested Linux with intel_iommu=on and that booted successfully.
> Under Xen, this system sets iommu_x2apic_enabled = true, so
> force_iommu is set and iommu=0 cannot disable the iommu.
> fails. Oh, I can disable x2apic and then disable iommu
>
> x2apic=1 -> failure above
> x2apic=0 iommu=0 -> failure above
> clocksource=acpi -> doesn't help
> clocksource=pit -> hangs after "load tracking window length 1073741824 ns"

None of these are surprising, given that Xen can't make any interrupts
work at all.

> noapic -> BUG in init_bsp_APIC

This is a surprise.  Its clearly a bug in Xen.  (OTOH, I've been
threatening to rip all of that logic out, because there is no such thing
as a 64bit capable system without an integrated APIC.)

> One other thing that might be noteworthy. Linux only prints ACPI IRQ0
> and IRQ9 used by override where Xen lists IRQ 0, 2 & 9.

Huh - this is supposed to come directly from the ACPI tables, so Linux
and Xen should be using the same source of information.

>
> Below is the re-constructed Xen console output. The SMBIOS line is
> the first thing displayed on the VGA output.

Yes - it is the first thing printed after vesa_init() which I think is a
manifestation of a previous EFI bug I've reported.  Does booting with
-basevideo help?  (No need to transcribe the output, manually.  Just
need to know if it lets you see the full log.)

> I skipped the full EFI
> memory map dump since it is quite long.
>
> I've also attached the Linux dmesg output. Any pointers or
> suggestions are most welcome.

Lets start with getting Xen able to limp along to a full boot.  After
that, we can figure out how to stop it making silly decisions during boot.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [BUG] panic: "IO-APIC + timer doesn't work" - several people have reproduced [ In reply to ]
On Mon, Feb 17, 2020 at 2:46 PM Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>
> On 17/02/2020 19:19, Jason Andryuk wrote:
> > enabling vecOn Tue, Dec 31, 2019 at 5:43 AM Aaron Janse <aaron@ajanse.me> wrote:
> >> On Tue, Dec 31, 2019, at 12:27 AM, Andrew Cooper wrote:
> >>> Is there any full boot log in the bad case? Debugging via divination
> >>> isn't an effective way to get things done.
> >> Agreed. I included some more verbose logs towards the end of the email (typed up by hand).
> >>
> >> Attached are pictures from a slow-motion video of my laptop booting. Note that I also included a picture of a stack trace that happens immediately before reboot. It doesn't look related, but I wanted to include it anyway.
> >>
> >> I think the original email should have said "4.8.5" instead of "4.0.5." Regardless, everyone on this mailing list can now see all the boot logs that I've seen.
> >>
> >> Attaching a serial console seems like it would be difficult to do on this laptop, otherwise I would have sent the logs as a txt file.
> > I'm seeing Xen panic: "IO-APIC + timer doesn't work" on a Dell
> > Latitude 7200 2-in-1. Fedora 31 Live USB image boots successfully.
> > No way to get serial output. I manually recreated the output before
> > from the vga display.
>
> We have multiple bugs.
>
> First and foremost, Xen seems totally broken when running in ExtINT
> mode. This needs addressing, and ought to be sufficient to let Xen
> boot, at which point we can try to figure out why it is trying to fall
> back into 486(ish) compatibility mode.
>
> > I tested Linux with intel_iommu=on and that booted successfully.
> > Under Xen, this system sets iommu_x2apic_enabled = true, so
> > force_iommu is set and iommu=0 cannot disable the iommu.
> > fails. Oh, I can disable x2apic and then disable iommu
> >
> > x2apic=1 -> failure above
> > x2apic=0 iommu=0 -> failure above
> > clocksource=acpi -> doesn't help
> > clocksource=pit -> hangs after "load tracking window length 1073741824 ns"
>
> None of these are surprising, given that Xen can't make any interrupts
> work at all.
>
> > noapic -> BUG in init_bsp_APIC
>
> This is a surprise. Its clearly a bug in Xen. (OTOH, I've been
> threatening to rip all of that logic out, because there is no such thing
> as a 64bit capable system without an integrated APIC.)

It's a GPF [error_code=0000] at init_bsp_APIC+0x53 which is
0xffff82d080428f86 <+64>: je 0xffff82d080428fc9 <init_bsp_APIC+131>
0xffff82d080428f88 <+66>: or $0xff,%al
0xffff82d080428f8a <+68>: test %sil,%sil
0xffff82d080428f8d <+71>: je 0xffff82d080428fd8 <init_bsp_APIC+146>
0xffff82d080428f8f <+73>: mov $0x80f,%ecx
0xffff82d080428f94 <+78>: mov $0x0,%edx
0xffff82d080428f99 <+83>: wrmsr

RAX is 0x3ff

This is immediately after Xen prints "Switched to APIC driver x2apic_cluster"

> > One other thing that might be noteworthy. Linux only prints ACPI IRQ0
> > and IRQ9 used by override where Xen lists IRQ 0, 2 & 9.
>
> Huh - this is supposed to come directly from the ACPI tables, so Linux
> and Xen should be using the same source of information.
>
> >
> > Below is the re-constructed Xen console output. The SMBIOS line is
> > the first thing displayed on the VGA output.
>
> Yes - it is the first thing printed after vesa_init() which I think is a
> manifestation of a previous EFI bug I've reported. Does booting with
> -basevideo help? (No need to transcribe the output, manually. Just
> need to know if it lets you see the full log.)

I'm booting grub->xen.gz so -basevideo isn't directly applicable. My
attempt at setting a boot entry failed, so I'll have to try that
again.

> > I skipped the full EFI
> > memory map dump since it is quite long.
> >
> > I've also attached the Linux dmesg output. Any pointers or
> > suggestions are most welcome.
>
> Lets start with getting Xen able to limp along to a full boot. After
> that, we can figure out how to stop it making silly decisions during boot.
>
> ~Andrew

Thanks for taking a look.

-Jason

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [BUG] panic: "IO-APIC + timer doesn't work" - several people have reproduced [ In reply to ]
On 17/02/2020 20:41, Jason Andryuk wrote:
> On Mon, Feb 17, 2020 at 2:46 PM Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>> On 17/02/2020 19:19, Jason Andryuk wrote:
>>> enabling vecOn Tue, Dec 31, 2019 at 5:43 AM Aaron Janse <aaron@ajanse.me> wrote:
>>>> On Tue, Dec 31, 2019, at 12:27 AM, Andrew Cooper wrote:
>>>>> Is there any full boot log in the bad case? Debugging via divination
>>>>> isn't an effective way to get things done.
>>>> Agreed. I included some more verbose logs towards the end of the email (typed up by hand).
>>>>
>>>> Attached are pictures from a slow-motion video of my laptop booting. Note that I also included a picture of a stack trace that happens immediately before reboot. It doesn't look related, but I wanted to include it anyway.
>>>>
>>>> I think the original email should have said "4.8.5" instead of "4.0.5." Regardless, everyone on this mailing list can now see all the boot logs that I've seen.
>>>>
>>>> Attaching a serial console seems like it would be difficult to do on this laptop, otherwise I would have sent the logs as a txt file.
>>> I'm seeing Xen panic: "IO-APIC + timer doesn't work" on a Dell
>>> Latitude 7200 2-in-1. Fedora 31 Live USB image boots successfully.
>>> No way to get serial output. I manually recreated the output before
>>> from the vga display.
>> We have multiple bugs.
>>
>> First and foremost, Xen seems totally broken when running in ExtINT
>> mode. This needs addressing, and ought to be sufficient to let Xen
>> boot, at which point we can try to figure out why it is trying to fall
>> back into 486(ish) compatibility mode.
>>
>>> I tested Linux with intel_iommu=on and that booted successfully.
>>> Under Xen, this system sets iommu_x2apic_enabled = true, so
>>> force_iommu is set and iommu=0 cannot disable the iommu.
>>> fails. Oh, I can disable x2apic and then disable iommu
>>>
>>> x2apic=1 -> failure above
>>> x2apic=0 iommu=0 -> failure above
>>> clocksource=acpi -> doesn't help
>>> clocksource=pit -> hangs after "load tracking window length 1073741824 ns"
>> None of these are surprising, given that Xen can't make any interrupts
>> work at all.
>>
>>> noapic -> BUG in init_bsp_APIC
>> This is a surprise. Its clearly a bug in Xen. (OTOH, I've been
>> threatening to rip all of that logic out, because there is no such thing
>> as a 64bit capable system without an integrated APIC.)
> It's a GPF [error_code=0000] at init_bsp_APIC+0x53 which is
> 0xffff82d080428f86 <+64>: je 0xffff82d080428fc9 <init_bsp_APIC+131>
> 0xffff82d080428f88 <+66>: or $0xff,%al
> 0xffff82d080428f8a <+68>: test %sil,%sil
> 0xffff82d080428f8d <+71>: je 0xffff82d080428fd8 <init_bsp_APIC+146>
> 0xffff82d080428f8f <+73>: mov $0x80f,%ecx
> 0xffff82d080428f94 <+78>: mov $0x0,%edx
> 0xffff82d080428f99 <+83>: wrmsr
>
> RAX is 0x3ff
>
> This is immediately after Xen prints "Switched to APIC driver x2apic_cluster"

Hmm, in which case it isn't a BUG specifically, but merely a crash.
0x3ff to SPIV is trying to set reserved bits, so it is no surprise that
there is a #GP.

In which case this can safely be filed under "even more collateral
damage from failing to set up any kind of interrupt handling".

>>> One other thing that might be noteworthy. Linux only prints ACPI IRQ0
>>> and IRQ9 used by override where Xen lists IRQ 0, 2 & 9.
>> Huh - this is supposed to come directly from the ACPI tables, so Linux
>> and Xen should be using the same source of information.
>>
>>> Below is the re-constructed Xen console output. The SMBIOS line is
>>> the first thing displayed on the VGA output.
>> Yes - it is the first thing printed after vesa_init() which I think is a
>> manifestation of a previous EFI bug I've reported. Does booting with
>> -basevideo help? (No need to transcribe the output, manually. Just
>> need to know if it lets you see the full log.)
> I'm booting grub->xen.gz so -basevideo isn't directly applicable. My
> attempt at setting a boot entry failed, so I'll have to try that
> again.

Ah ok.  One thing which Xen(.gz) needs to do is to take video details
from the bootloader rather than trying to figure them out itself.

By default, Xen.gz will try and write into the legacy vga range which
most likely isn't working in an EFI system.

(As a slight tangent, It is possible to test xen.efi via grub with a
suitable chainloader stanza, but xen.efi is deficient in enough
important ways that I'd avoid it unless absolutely necessary.)

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [BUG] panic: "IO-APIC + timer doesn't work" - several people have reproduced [ In reply to ]
On Mon, Feb 17, 2020, 8:22 PM Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>
> On 17/02/2020 20:41, Jason Andryuk wrote:
> > On Mon, Feb 17, 2020 at 2:46 PM Andrew Cooper <andrew.cooper3@citrix.com> wrote:
> >> On 17/02/2020 19:19, Jason Andryuk wrote:
> >>> enabling vecOn Tue, Dec 31, 2019 at 5:43 AM Aaron Janse <aaron@ajanse.me> wrote:
> >>>> On Tue, Dec 31, 2019, at 12:27 AM, Andrew Cooper wrote:
> >>>>> Is there any full boot log in the bad case? Debugging via divination
> >>>>> isn't an effective way to get things done.
> >>>> Agreed. I included some more verbose logs towards the end of the email (typed up by hand).
> >>>>
> >>>> Attached are pictures from a slow-motion video of my laptop booting. Note that I also included a picture of a stack trace that happens immediately before reboot. It doesn't look related, but I wanted to include it anyway.
> >>>>
> >>>> I think the original email should have said "4.8.5" instead of "4.0.5." Regardless, everyone on this mailing list can now see all the boot logs that I've seen.
> >>>>
> >>>> Attaching a serial console seems like it would be difficult to do on this laptop, otherwise I would have sent the logs as a txt file.
> >>> I'm seeing Xen panic: "IO-APIC + timer doesn't work" on a Dell
> >>> Latitude 7200 2-in-1. Fedora 31 Live USB image boots successfully.
> >>> No way to get serial output. I manually recreated the output before
> >>> from the vga display.
> >> We have multiple bugs.
> >>
> >> First and foremost, Xen seems totally broken when running in ExtINT
> >> mode. This needs addressing, and ought to be sufficient to let Xen
> >> boot, at which point we can try to figure out why it is trying to fall
> >> back into 486(ish) compatibility mode.

Xen has "enabled ExtINT on CPU#0" while linux has "masked ExtINT on
CPU#0" so linux isn't using ExtINT?

I copy and pasted the linux setup_local_APIC() into Xen and then
massaged it into compiling, Now Xen reports masked ExtINT, but still
fails to enable the timer.

> >>> I tested Linux with intel_iommu=on and that booted successfully.
> >>> Under Xen, this system sets iommu_x2apic_enabled = true, so
> >>> force_iommu is set and iommu=0 cannot disable the iommu.
> >>> fails. Oh, I can disable x2apic and then disable iommu
> >>>
> >>> x2apic=1 -> failure above
> >>> x2apic=0 iommu=0 -> failure above
> >>> clocksource=acpi -> doesn't help
> >>> clocksource=pit -> hangs after "load tracking window length 1073741824 ns"
> >> None of these are surprising, given that Xen can't make any interrupts
> >> work at all.
> >>
> >>> noapic -> BUG in init_bsp_APIC
> >> This is a surprise. Its clearly a bug in Xen. (OTOH, I've been
> >> threatening to rip all of that logic out, because there is no such thing
> >> as a 64bit capable system without an integrated APIC.)
> > It's a GPF [error_code=0000] at init_bsp_APIC+0x53 which is
> > 0xffff82d080428f86 <+64>: je 0xffff82d080428fc9 <init_bsp_APIC+131>
> > 0xffff82d080428f88 <+66>: or $0xff,%al
> > 0xffff82d080428f8a <+68>: test %sil,%sil
> > 0xffff82d080428f8d <+71>: je 0xffff82d080428fd8 <init_bsp_APIC+146>
> > 0xffff82d080428f8f <+73>: mov $0x80f,%ecx
> > 0xffff82d080428f94 <+78>: mov $0x0,%edx
> > 0xffff82d080428f99 <+83>: wrmsr
> >
> > RAX is 0x3ff
> >
> > This is immediately after Xen prints "Switched to APIC driver x2apic_cluster"
>
> Hmm, in which case it isn't a BUG specifically, but merely a crash.
> 0x3ff to SPIV is trying to set reserved bits, so it is no surprise that
> there is a #GP.

Yeah, I used the wrong word. There was a backtrace and it rebooted
quickly, so I didn't have details when I wrote the first email. I
re-ran afterward to capture the info.

> In which case this can safely be filed under "even more collateral
> damage from failing to set up any kind of interrupt handling".
>
> >>> One other thing that might be noteworthy. Linux only prints ACPI IRQ0
> >>> and IRQ9 used by override where Xen lists IRQ 0, 2 & 9.
> >> Huh - this is supposed to come directly from the ACPI tables, so Linux
> >> and Xen should be using the same source of information.
> >>
> >>> Below is the re-constructed Xen console output. The SMBIOS line is
> >>> the first thing displayed on the VGA output.
> >> Yes - it is the first thing printed after vesa_init() which I think is a
> >> manifestation of a previous EFI bug I've reported. Does booting with
> >> -basevideo help? (No need to transcribe the output, manually. Just
> >> need to know if it lets you see the full log.)
> > I'm booting grub->xen.gz so -basevideo isn't directly applicable. My
> > attempt at setting a boot entry failed, so I'll have to try that
> > again.
>
> Ah ok. One thing which Xen(.gz) needs to do is to take video details
> from the bootloader rather than trying to figure them out itself.
>
> By default, Xen.gz will try and write into the legacy vga range which
> most likely isn't working in an EFI system.
>
> (As a slight tangent, It is possible to test xen.efi via grub with a
> suitable chainloader stanza, but xen.efi is deficient in enough
> important ways that I'd avoid it unless absolutely necessary.)

I think I tried chainloader at some point and received an "Unsupported
relocation type" error.

This Dell doesn't want to boot my xen.efi. After selecting a boot
entry, there is a 3-4 second pause and then EFI prints "Press
F1/VolumeUp key to retry boot."

-Jason

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [BUG] panic: "IO-APIC + timer doesn't work" - several people have reproduced [ In reply to ]
On 18/02/2020 18:43, Jason Andryuk wrote:
> On Mon, Feb 17, 2020, 8:22 PM Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>> On 17/02/2020 20:41, Jason Andryuk wrote:
>>> On Mon, Feb 17, 2020 at 2:46 PM Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>>>> On 17/02/2020 19:19, Jason Andryuk wrote:
>>>>> enabling vecOn Tue, Dec 31, 2019 at 5:43 AM Aaron Janse <aaron@ajanse.me> wrote:
>>>>>> On Tue, Dec 31, 2019, at 12:27 AM, Andrew Cooper wrote:
>>>>>>> Is there any full boot log in the bad case? Debugging via divination
>>>>>>> isn't an effective way to get things done.
>>>>>> Agreed. I included some more verbose logs towards the end of the email (typed up by hand).
>>>>>>
>>>>>> Attached are pictures from a slow-motion video of my laptop booting. Note that I also included a picture of a stack trace that happens immediately before reboot. It doesn't look related, but I wanted to include it anyway.
>>>>>>
>>>>>> I think the original email should have said "4.8.5" instead of "4.0.5." Regardless, everyone on this mailing list can now see all the boot logs that I've seen.
>>>>>>
>>>>>> Attaching a serial console seems like it would be difficult to do on this laptop, otherwise I would have sent the logs as a txt file.
>>>>> I'm seeing Xen panic: "IO-APIC + timer doesn't work" on a Dell
>>>>> Latitude 7200 2-in-1. Fedora 31 Live USB image boots successfully.
>>>>> No way to get serial output. I manually recreated the output before
>>>>> from the vga display.
>>>> We have multiple bugs.
>>>>
>>>> First and foremost, Xen seems totally broken when running in ExtINT
>>>> mode. This needs addressing, and ought to be sufficient to let Xen
>>>> boot, at which point we can try to figure out why it is trying to fall
>>>> back into 486(ish) compatibility mode.
> Xen has "enabled ExtINT on CPU#0" while linux has "masked ExtINT on
> CPU#0" so linux isn't using ExtINT?

It would appear not.  Even more concerningly, on my Kabylake box,

# xl dmesg | grep ExtINT
(XEN) enabled ExtINT on CPU#0
(XEN) masked ExtINT on CPU#1
(XEN) masked ExtINT on CPU#2
(XEN) masked ExtINT on CPU#3
(XEN) masked ExtINT on CPU#4
(XEN) masked ExtINT on CPU#5
(XEN) masked ExtINT on CPU#6
(XEN) masked ExtINT on CPU#7

which at first glance suggests that we have something asymmetric being
set up.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [BUG] panic: "IO-APIC + timer doesn't work" - several people have reproduced [ In reply to ]
On 18.02.2020 22:45, Andrew Cooper wrote:
> On 18/02/2020 18:43, Jason Andryuk wrote:
>> On Mon, Feb 17, 2020, 8:22 PM Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>>> On 17/02/2020 20:41, Jason Andryuk wrote:
>>>> On Mon, Feb 17, 2020 at 2:46 PM Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>>>>> On 17/02/2020 19:19, Jason Andryuk wrote:
>>>>>> enabling vecOn Tue, Dec 31, 2019 at 5:43 AM Aaron Janse <aaron@ajanse.me> wrote:
>>>>>>> On Tue, Dec 31, 2019, at 12:27 AM, Andrew Cooper wrote:
>>>>>>>> Is there any full boot log in the bad case? Debugging via divination
>>>>>>>> isn't an effective way to get things done.
>>>>>>> Agreed. I included some more verbose logs towards the end of the email (typed up by hand).
>>>>>>>
>>>>>>> Attached are pictures from a slow-motion video of my laptop booting. Note that I also included a picture of a stack trace that happens immediately before reboot. It doesn't look related, but I wanted to include it anyway.
>>>>>>>
>>>>>>> I think the original email should have said "4.8.5" instead of "4.0.5." Regardless, everyone on this mailing list can now see all the boot logs that I've seen.
>>>>>>>
>>>>>>> Attaching a serial console seems like it would be difficult to do on this laptop, otherwise I would have sent the logs as a txt file.
>>>>>> I'm seeing Xen panic: "IO-APIC + timer doesn't work" on a Dell
>>>>>> Latitude 7200 2-in-1. Fedora 31 Live USB image boots successfully.
>>>>>> No way to get serial output. I manually recreated the output before
>>>>>> from the vga display.
>>>>> We have multiple bugs.
>>>>>
>>>>> First and foremost, Xen seems totally broken when running in ExtINT
>>>>> mode. This needs addressing, and ought to be sufficient to let Xen
>>>>> boot, at which point we can try to figure out why it is trying to fall
>>>>> back into 486(ish) compatibility mode.
>> Xen has "enabled ExtINT on CPU#0" while linux has "masked ExtINT on
>> CPU#0" so linux isn't using ExtINT?
>
> It would appear not.  Even more concerningly, on my Kabylake box,
>
> # xl dmesg | grep ExtINT
> (XEN) enabled ExtINT on CPU#0
> (XEN) masked ExtINT on CPU#1
> (XEN) masked ExtINT on CPU#2
> (XEN) masked ExtINT on CPU#3
> (XEN) masked ExtINT on CPU#4
> (XEN) masked ExtINT on CPU#5
> (XEN) masked ExtINT on CPU#6
> (XEN) masked ExtINT on CPU#7
>
> which at first glance suggests that we have something asymmetric being
> set up.

That's perfectly normal - ExtINT may be enabled on just one CPU,
and that's CPU0 in our case (until such time that we would want
to be able to offline CPU0).

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [BUG] panic: "IO-APIC + timer doesn't work" - several people have reproduced [ In reply to ]
> > >>> One other thing that might be noteworthy. Linux only prints ACPI IRQ0
> > >>> and IRQ9 used by override where Xen lists IRQ 0, 2 & 9.
> > >> Huh - this is supposed to come directly from the ACPI tables, so Linux
> > >> and Xen should be using the same source of information.

Both Xen and Linux only see two ACPI overrides (0 & 9) from the
tables. However the Xen logic in mp_config_acpi_legacy_irqs() thinks
IRQ2 is an override
irq 2: irq->mpc_srcbus 0, irq->mpc_srcbusirq 0, irq->mpc_dstapic 2,
intsrc.mpc_dstapic 2
Matches
((irq->mpc_dstapic == intsrc.mpc_dstapic) &&
(irq->mpc_dstirq == i))

i is 2, so irq->mpc_dstirq must be as well.

Regards,
Jason

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [BUG] panic: "IO-APIC + timer doesn't work" - several people have reproduced [ In reply to ]
On Wed, Feb 19, 2020 at 3:25 AM Jan Beulich <jbeulich@suse.com> wrote:
>
> On 18.02.2020 22:45, Andrew Cooper wrote:
> > On 18/02/2020 18:43, Jason Andryuk wrote:
> >> On Mon, Feb 17, 2020, 8:22 PM Andrew Cooper <andrew.cooper3@citrix.com> wrote:
> >>> On 17/02/2020 20:41, Jason Andryuk wrote:
> >>>> On Mon, Feb 17, 2020 at 2:46 PM Andrew Cooper <andrew.cooper3@citrix.com> wrote:
> >>>>> We have multiple bugs.
> >>>>>
> >>>>> First and foremost, Xen seems totally broken when running in ExtINT
> >>>>> mode. This needs addressing, and ought to be sufficient to let Xen
> >>>>> boot, at which point we can try to figure out why it is trying to fall
> >>>>> back into 486(ish) compatibility mode.
> >> Xen has "enabled ExtINT on CPU#0" while linux has "masked ExtINT on
> >> CPU#0" so linux isn't using ExtINT?
> >
> > It would appear not. Even more concerningly, on my Kabylake box,
> >
> > # xl dmesg | grep ExtINT
> > (XEN) enabled ExtINT on CPU#0
> > (XEN) masked ExtINT on CPU#1
> > (XEN) masked ExtINT on CPU#2
> > (XEN) masked ExtINT on CPU#3
> > (XEN) masked ExtINT on CPU#4
> > (XEN) masked ExtINT on CPU#5
> > (XEN) masked ExtINT on CPU#6
> > (XEN) masked ExtINT on CPU#7
> >
> > which at first glance suggests that we have something asymmetric being
> > set up.
>
> That's perfectly normal - ExtINT may be enabled on just one CPU,
> and that's CPU0 in our case (until such time that we would want
> to be able to offline CPU0).

Thanks, Jan. Linux prints masked ExtINT for all 8 CPU threads.

I inserted __print_IO_APIC() before the "IO-APIC + timer doesn't work" panic.

Using vector-based indexing
IRQ to ping mappings:
IRQ240 -> 0:2

where Linux prints
IRQ0 -> 0:2

That may just be the difference between Xen printing the Vector vs.
Linux printing the IRQ number.

Any pointers to what I should investigate?

Regards,
Jason

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [BUG] panic: "IO-APIC + timer doesn't work" - several people have reproduced [ In reply to ]
On Wed, Mar 4, 2020 at 11:06 AM Jason Andryuk <jandryuk@gmail.com> wrote:
>
> On Wed, Feb 19, 2020 at 3:25 AM Jan Beulich <jbeulich@suse.com> wrote:
> >
> > On 18.02.2020 22:45, Andrew Cooper wrote:
> > > On 18/02/2020 18:43, Jason Andryuk wrote:
> > >> On Mon, Feb 17, 2020, 8:22 PM Andrew Cooper <andrew.cooper3@citrix.com> wrote:
> > >>> On 17/02/2020 20:41, Jason Andryuk wrote:
> > >>>> On Mon, Feb 17, 2020 at 2:46 PM Andrew Cooper <andrew.cooper3@citrix.com> wrote:
> > >>>>> We have multiple bugs.
> > >>>>>
> > >>>>> First and foremost, Xen seems totally broken when running in ExtINT
> > >>>>> mode. This needs addressing, and ought to be sufficient to let Xen
> > >>>>> boot, at which point we can try to figure out why it is trying to fall
> > >>>>> back into 486(ish) compatibility mode.
> > >> Xen has "enabled ExtINT on CPU#0" while linux has "masked ExtINT on
> > >> CPU#0" so linux isn't using ExtINT?
> > >
> > > It would appear not. Even more concerningly, on my Kabylake box,
> > >
> > > # xl dmesg | grep ExtINT
> > > (XEN) enabled ExtINT on CPU#0
> > > (XEN) masked ExtINT on CPU#1
> > > (XEN) masked ExtINT on CPU#2
> > > (XEN) masked ExtINT on CPU#3
> > > (XEN) masked ExtINT on CPU#4
> > > (XEN) masked ExtINT on CPU#5
> > > (XEN) masked ExtINT on CPU#6
> > > (XEN) masked ExtINT on CPU#7
> > >
> > > which at first glance suggests that we have something asymmetric being
> > > set up.
> >
> > That's perfectly normal - ExtINT may be enabled on just one CPU,
> > and that's CPU0 in our case (until such time that we would want
> > to be able to offline CPU0).
>
> Thanks, Jan. Linux prints masked ExtINT for all 8 CPU threads.
>
> I inserted __print_IO_APIC() before the "IO-APIC + timer doesn't work" panic.
>
> Using vector-based indexing
> IRQ to ping mappings:
> IRQ240 -> 0:2
>
> where Linux prints
> IRQ0 -> 0:2
>
> That may just be the difference between Xen printing the Vector vs.
> Linux printing the IRQ number.
>
> Any pointers to what I should investigate?

I got it to boot past "IO-APIC + timer doesn't work". I programmed
the HPET to provide a periodic timer in hpet_resume() on T0. When I
actually got it programmed properly, it worked to increment
pit0_ticks. I also made timer_interrupt() unconditionally
pit0_ticks++ though that may not matter.

Now it panics in pv_destroy_gdt() when it fails "ASSERT(v == current
|| !vcpu_cpu_dirty(v));" when building dom0. I haven't investigated
that yet.

Regards,
Jason

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [BUG] panic: "IO-APIC + timer doesn't work" - several people have reproduced [ In reply to ]
On Tue, Mar 17, 2020 at 9:48 AM Jason Andryuk <jandryuk@gmail.com> wrote:
> I got it to boot past "IO-APIC + timer doesn't work". I programmed
> the HPET to provide a periodic timer in hpet_resume() on T0. When I
> actually got it programmed properly, it worked to increment
> pit0_ticks. I also made timer_interrupt() unconditionally
> pit0_ticks++ though that may not matter.

Also, HPET_CFG_LEGACY is enabled for the HPET.

Regards,
Jason

> Now it panics in pv_destroy_gdt() when it fails "ASSERT(v == current
> || !vcpu_cpu_dirty(v));" when building dom0. I haven't investigated
> that yet.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [BUG] panic: "IO-APIC + timer doesn't work" - several people have reproduced [ In reply to ]
On 17.03.2020 14:48, Jason Andryuk wrote:
> On Wed, Mar 4, 2020 at 11:06 AM Jason Andryuk <jandryuk@gmail.com> wrote:
>>
>> On Wed, Feb 19, 2020 at 3:25 AM Jan Beulich <jbeulich@suse.com> wrote:
>>>
>>> On 18.02.2020 22:45, Andrew Cooper wrote:
>>>> On 18/02/2020 18:43, Jason Andryuk wrote:
>>>>> On Mon, Feb 17, 2020, 8:22 PM Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>>>>>> On 17/02/2020 20:41, Jason Andryuk wrote:
>>>>>>> On Mon, Feb 17, 2020 at 2:46 PM Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>>>>>>>> We have multiple bugs.
>>>>>>>>
>>>>>>>> First and foremost, Xen seems totally broken when running in ExtINT
>>>>>>>> mode. This needs addressing, and ought to be sufficient to let Xen
>>>>>>>> boot, at which point we can try to figure out why it is trying to fall
>>>>>>>> back into 486(ish) compatibility mode.
>>>>> Xen has "enabled ExtINT on CPU#0" while linux has "masked ExtINT on
>>>>> CPU#0" so linux isn't using ExtINT?
>>>>
>>>> It would appear not. Even more concerningly, on my Kabylake box,
>>>>
>>>> # xl dmesg | grep ExtINT
>>>> (XEN) enabled ExtINT on CPU#0
>>>> (XEN) masked ExtINT on CPU#1
>>>> (XEN) masked ExtINT on CPU#2
>>>> (XEN) masked ExtINT on CPU#3
>>>> (XEN) masked ExtINT on CPU#4
>>>> (XEN) masked ExtINT on CPU#5
>>>> (XEN) masked ExtINT on CPU#6
>>>> (XEN) masked ExtINT on CPU#7
>>>>
>>>> which at first glance suggests that we have something asymmetric being
>>>> set up.
>>>
>>> That's perfectly normal - ExtINT may be enabled on just one CPU,
>>> and that's CPU0 in our case (until such time that we would want
>>> to be able to offline CPU0).
>>
>> Thanks, Jan. Linux prints masked ExtINT for all 8 CPU threads.
>>
>> I inserted __print_IO_APIC() before the "IO-APIC + timer doesn't work" panic.
>>
>> Using vector-based indexing
>> IRQ to ping mappings:
>> IRQ240 -> 0:2
>>
>> where Linux prints
>> IRQ0 -> 0:2
>>
>> That may just be the difference between Xen printing the Vector vs.
>> Linux printing the IRQ number.
>>
>> Any pointers to what I should investigate?
>
> I got it to boot past "IO-APIC + timer doesn't work". I programmed
> the HPET to provide a periodic timer in hpet_resume() on T0. When I
> actually got it programmed properly, it worked to increment
> pit0_ticks. I also made timer_interrupt() unconditionally
> pit0_ticks++ though that may not matter.

Hmm, at the first glance I would imply the system gets handed to Xen
with a HPET state that we don't (and probably also shouldn't) expect.
Could you provide HPET_CFG as well as all HPET_Tn_CFG and
HPET_Tn_ROUTE values as hpet_resume() finds them before doing any
adjustments to them? What are the components / parties involved in
getting Xen loaded and started?

> Now it panics in pv_destroy_gdt() when it fails "ASSERT(v == current
> || !vcpu_cpu_dirty(v));" when building dom0. I haven't investigated
> that yet.

This would seem entirely unrelated to me.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [BUG] panic: "IO-APIC + timer doesn't work" - several people have reproduced [ In reply to ]
On 17.03.2020 15:08, Jan Beulich wrote:
> On 17.03.2020 14:48, Jason Andryuk wrote:
>> I got it to boot past "IO-APIC + timer doesn't work". I programmed
>> the HPET to provide a periodic timer in hpet_resume() on T0. When I
>> actually got it programmed properly, it worked to increment
>> pit0_ticks. I also made timer_interrupt() unconditionally
>> pit0_ticks++ though that may not matter.
>
> Hmm, at the first glance I would imply the system gets handed to Xen
> with a HPET state that we don't (and probably also shouldn't) expect.
> Could you provide HPET_CFG as well as all HPET_Tn_CFG and
> HPET_Tn_ROUTE values as hpet_resume() finds them before doing any
> adjustments to them? What are the components / parties involved in
> getting Xen loaded and started?

Of course much depends on what exactly you mean you've done to
the HPET by saying "I programmed the HPET to provide ...".

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [BUG] panic: "IO-APIC + timer doesn't work" - several people have reproduced [ In reply to ]
On 17.03.2020 15:08, Jason Andryuk wrote:
> On Tue, Mar 17, 2020 at 9:48 AM Jason Andryuk <jandryuk@gmail.com> wrote:
>> I got it to boot past "IO-APIC + timer doesn't work". I programmed
>> the HPET to provide a periodic timer in hpet_resume() on T0. When I
>> actually got it programmed properly, it worked to increment
>> pit0_ticks. I also made timer_interrupt() unconditionally
>> pit0_ticks++ though that may not matter.
>
> Also, HPET_CFG_LEGACY is enabled for the HPET.

Which we clear in hpet_resume(), much like Linux does in
hpet_enable().

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [BUG] panic: "IO-APIC + timer doesn't work" - several people have reproduced [ In reply to ]
On 17.03.2020 15:08, Jan Beulich wrote:
>On 17.03.2020 15:08, Jan Beulich wrote:
>> On 17.03.2020 14:48, Jason Andryuk wrote:
>>> I got it to boot past "IO-APIC + timer doesn't work". I programmed
>>> the HPET to provide a periodic timer in hpet_resume() on T0. When I
>>> actually got it programmed properly, it worked to increment
>>> pit0_ticks. I also made timer_interrupt() unconditionally
>>> pit0_ticks++ though that may not matter.
>>
>> Hmm, at the first glance I would imply the system gets handed to Xen
>> with a HPET state that we don't (and probably also shouldn't) expect.
>> Could you provide HPET_CFG as well as all HPET_Tn_CFG and
>> HPET_Tn_ROUTE values as hpet_resume() finds them before doing any
>> adjustments to them? What are the components / parties involved in
>> getting Xen loaded and started?
>
>Of course much depends on what exactly you mean you've done to
>the HPET by saying "I programmed the HPET to provide ...".

Below is the diff. It was messier and I tidied it up some.

It's mainly the change to hpet_resume() to mimic Linux's legacy HPET
setup on T0. It turns on HPET_CFG_LEGACY to ensure the timer interrupt
is running. And it also includes the printing of the initial HPET
config:
HPET_CFG 00000001
HPET_T0_CFG 00008030
HPET_T0_ROUTE 0000016c
HPET_T1_CFG 00008000
HPET_T1_ROUTE 00000000
HPET_T2_CFG 00008000
HPET_T2_ROUTE 00000000
HPET_T3_CFG 00008000
HPET_T3_ROUTE 00000000
HPET_T4_CFG 0000c000
HPET_T4_ROUTE 00000000
HPET_T5_CFG 0000c000
HPET_T5_ROUTE 00000000
HPET_T6_CFG 0000c000
HPET_T6_ROUTE 00000000
HPET_T7_CFG 0000c000
HPET_T7_ROUTE 00000000

Other changes are to try to prevent Xen from clobbering T0 as a periodic
timer. I had some printks and didn't see Xen call any of them though.

Regards,
Jason

diff --git a/xen/arch/x86/hpet.c b/xen/arch/x86/hpet.c
index 86929b9ba1..f39aafda7d 100644
--- a/xen/arch/x86/hpet.c
+++ b/xen/arch/x86/hpet.c
@@ -585,16 +585,27 @@ void __init hpet_broadcast_init(void)
pv_rtc_handler = handle_rtc_once;
}

+ printk(XENLOG_INFO "%s cfg %d\n", __func__, cfg);
hpet_write32(cfg, HPET_CFG);

for ( i = 0; i < n; i++ )
{
- if ( i == 0 && (cfg & HPET_CFG_LEGACY) )
+ printk(XENLOG_INFO "hpet cfg %d legacy %d\n", i, cfg & HPET_CFG_LEGACY);
+ if ( i == 1 && (cfg & HPET_CFG_LEGACY) )
{
/* set HPET T0 as oneshot */
- cfg = hpet_read32(HPET_Tn_CFG(0));
+ cfg = hpet_read32(HPET_Tn_CFG(1));
cfg &= ~(HPET_TN_LEVEL | HPET_TN_PERIODIC);
cfg |= HPET_TN_ENABLE | HPET_TN_32BIT;
+ hpet_write32(cfg, HPET_Tn_CFG(1));
+ }
+
+ if ( i == 0 && (cfg & HPET_CFG_LEGACY) )
+ {
+ /* set HPET T0 as periodic */
+ cfg = hpet_read32(HPET_Tn_CFG(0));
+ cfg |= (HPET_TN_LEVEL | HPET_TN_PERIODIC);
+ cfg |= HPET_TN_ENABLE | HPET_TN_32BIT;
hpet_write32(cfg, HPET_Tn_CFG(0));
}

@@ -645,6 +656,7 @@ void hpet_broadcast_resume(void)
n = 1;
}

+ printk(XENLOG_INFO "%s cfg %d\n", __func__, cfg);
hpet_write32(cfg, HPET_CFG);

for ( i = 0; i < n; i++ )
@@ -652,6 +664,7 @@ void hpet_broadcast_resume(void)
if ( hpet_events[i].msi.irq >= 0 )
__hpet_setup_msi_irq(irq_to_desc(hpet_events[i].msi.irq));

+ if (i != 0) {
/* set HPET Tn as oneshot */
cfg = hpet_read32(HPET_Tn_CFG(hpet_events[i].idx));
cfg &= ~(HPET_TN_LEVEL | HPET_TN_PERIODIC);
@@ -659,6 +672,7 @@ void hpet_broadcast_resume(void)
if ( !(hpet_events[i].flags & HPET_EVT_LEGACY) )
cfg |= HPET_TN_FSB;
hpet_write32(cfg, HPET_Tn_CFG(hpet_events[i].idx));
+ }

hpet_events[i].next_event = STIME_MAX;
}
@@ -684,6 +698,7 @@ void hpet_disable_legacy_broadcast(void)
/* Stop HPET legacy interrupts */
cfg = hpet_read32(HPET_CFG);
cfg &= ~HPET_CFG_LEGACY;
+ printk(XENLOG_INFO "%s cfg %d\n", __func__, cfg);
hpet_write32(cfg, HPET_CFG);

spin_unlock_irqrestore(&hpet_events->lock, flags);
@@ -759,6 +774,7 @@ int hpet_legacy_irq_tick(void)
(hpet_events->flags & (HPET_EVT_DISABLE|HPET_EVT_LEGACY)) !=
HPET_EVT_LEGACY )
return 0;
+
hpet_events->event_handler(hpet_events);
return 1;
}
@@ -804,6 +820,8 @@ u64 __init hpet_setup(void)
return hpet_rate + (last * 2 > hpet_period);
}

+#include <asm/delay.h>
+
void hpet_resume(u32 *boot_cfg)
{
static u32 system_reset_latch;
@@ -815,6 +833,7 @@ void hpet_resume(u32 *boot_cfg)
system_reset_latch = system_reset_counter;

cfg = hpet_read32(HPET_CFG);
+ printk(XENLOG_INFO "%s HPET_CFG %08x\n", __func__, cfg);
if ( boot_cfg )
*boot_cfg = cfg;
cfg &= ~(HPET_CFG_ENABLE | HPET_CFG_LEGACY);
@@ -825,13 +844,18 @@ void hpet_resume(u32 *boot_cfg)
cfg);
cfg = 0;
}
+ printk(XENLOG_INFO "%s cfg %d\n", __func__, cfg);
hpet_write32(cfg, HPET_CFG);

hpet_id = hpet_read32(HPET_ID);
last = (hpet_id & HPET_ID_NUMBER) >> HPET_ID_NUMBER_SHIFT;
for ( i = 0; i <= last; ++i )
{
+ u32 tmp;
cfg = hpet_read32(HPET_Tn_CFG(i));
+ printk(XENLOG_INFO "%s HPET_T%d_CFG %08x\n", __func__, i, cfg);
+ tmp = hpet_read32(HPET_Tn_ROUTE(i));
+ printk(XENLOG_INFO "%s HPET_T%d_ROUTE %08x\n", __func__, i, tmp);
if ( boot_cfg )
boot_cfg[i + 1] = cfg;
cfg &= ~HPET_TN_ENABLE;
@@ -842,11 +866,34 @@ void hpet_resume(u32 *boot_cfg)
cfg & HPET_TN_RESERVED, i);
cfg &= ~HPET_TN_RESERVED;
}
+ if (i == 0) {
+ cfg |= HPET_TN_ENABLE | HPET_TN_PERIODIC | HPET_TN_SETVAL |
+ HPET_TN_32BIT;
+ }
hpet_write32(cfg, HPET_Tn_CFG(i));
+ if (i == 0) {
+#define NSEC_PER_SEC 1000000000L
+ uint64_t delta;
+ unsigned int now;
+ unsigned int cmp;
+ u64 hpet_rate = hpet_setup();
+ uint32_t mult = div_sc((unsigned long)hpet_rate,
+ 1000000000ul, 32);
+ uint32_t shift = 32;
+ printk(XENLOG_INFO "hpet mult %d shift %d\n", mult, shift);
+ delta = ((uint64_t)(NSEC_PER_SEC / HZ)) * mult;
+ delta >>= shift;
+ now = hpet_read32(HPET_COUNTER);
+ cmp = now + (unsigned int)delta;
+ hpet_write32(cmp, HPET_Tn_CMP(i));
+ udelay(1);
+ hpet_write32(delta, HPET_Tn_CMP(i));
+ }
}

cfg = hpet_read32(HPET_CFG);
- cfg |= HPET_CFG_ENABLE;
+ cfg |= HPET_CFG_ENABLE | HPET_CFG_LEGACY;
+ printk(XENLOG_INFO "%s cfg %d\n", __func__, cfg);
hpet_write32(cfg, HPET_CFG);
}

@@ -862,6 +909,7 @@ void hpet_disable(void)
return;
}

+ printk(XENLOG_INFO "%s cfg %d\n", __func__, *hpet_boot_cfg);
hpet_write32(*hpet_boot_cfg & ~HPET_CFG_ENABLE, HPET_CFG);

id = hpet_read32(HPET_ID);
@@ -869,5 +917,6 @@ void hpet_disable(void)
hpet_write32(hpet_boot_cfg[i + 1], HPET_Tn_CFG(i));

if ( *hpet_boot_cfg & HPET_CFG_ENABLE )
+ printk(XENLOG_INFO "%s cfg %d\n", __func__, *hpet_boot_cfg);
hpet_write32(*hpet_boot_cfg, HPET_CFG);
}

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [BUG] panic: "IO-APIC + timer doesn't work" - several people have reproduced [ In reply to ]
On Tue, Mar 17, 2020 at 11:23 AM Jason Andryuk <jandryuk@gmail.com> wrote:
>
> On 17.03.2020 15:08, Jan Beulich wrote:
> >On 17.03.2020 15:08, Jan Beulich wrote:
> >> On 17.03.2020 14:48, Jason Andryuk wrote:
> >>> I got it to boot past "IO-APIC + timer doesn't work". I programmed
> >>> the HPET to provide a periodic timer in hpet_resume() on T0. When I
> >>> actually got it programmed properly, it worked to increment
> >>> pit0_ticks. I also made timer_interrupt() unconditionally
> >>> pit0_ticks++ though that may not matter.
> >>
> >> Hmm, at the first glance I would imply the system gets handed to Xen
> >> with a HPET state that we don't (and probably also shouldn't) expect.
> >> Could you provide HPET_CFG as well as all HPET_Tn_CFG and
> >> HPET_Tn_ROUTE values as hpet_resume() finds them before doing any
> >> adjustments to them? What are the components / parties involved in
> >> getting Xen loaded and started?

I forgot to mention the boot sequence:
EFI -> grub2-efi -> xen.gz
grub2 is using multiboot2 & module2 commands.

Thanks for taking a look.

Regards,
Jason

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [BUG] panic: "IO-APIC + timer doesn't work" - several people have reproduced [ In reply to ]
On 17.03.2020 16:23, Jason Andryuk wrote:
> On 17.03.2020 15:08, Jan Beulich wrote:
>> On 17.03.2020 15:08, Jan Beulich wrote:
>>> On 17.03.2020 14:48, Jason Andryuk wrote:
>>>> I got it to boot past "IO-APIC + timer doesn't work". I programmed
>>>> the HPET to provide a periodic timer in hpet_resume() on T0. When I
>>>> actually got it programmed properly, it worked to increment
>>>> pit0_ticks. I also made timer_interrupt() unconditionally
>>>> pit0_ticks++ though that may not matter.
>>>
>>> Hmm, at the first glance I would imply the system gets handed to Xen
>>> with a HPET state that we don't (and probably also shouldn't) expect.
>>> Could you provide HPET_CFG as well as all HPET_Tn_CFG and
>>> HPET_Tn_ROUTE values as hpet_resume() finds them before doing any
>>> adjustments to them? What are the components / parties involved in
>>> getting Xen loaded and started?
>>
>> Of course much depends on what exactly you mean you've done to
>> the HPET by saying "I programmed the HPET to provide ...".
>
> Below is the diff. It was messier and I tidied it up some.
>
> It's mainly the change to hpet_resume() to mimic Linux's legacy HPET
> setup on T0. It turns on HPET_CFG_LEGACY to ensure the timer interrupt
> is running. And it also includes the printing of the initial HPET
> config:
> HPET_CFG 00000001
> HPET_T0_CFG 00008030
> HPET_T0_ROUTE 0000016c
> HPET_T1_CFG 00008000
> HPET_T1_ROUTE 00000000
> HPET_T2_CFG 00008000
> HPET_T2_ROUTE 00000000
> HPET_T3_CFG 00008000
> HPET_T3_ROUTE 00000000
> HPET_T4_CFG 0000c000
> HPET_T4_ROUTE 00000000
> HPET_T5_CFG 0000c000
> HPET_T5_ROUTE 00000000
> HPET_T6_CFG 0000c000
> HPET_T6_ROUTE 00000000
> HPET_T7_CFG 0000c000
> HPET_T7_ROUTE 00000000
>
> Other changes are to try to prevent Xen from clobbering T0 as a periodic
> timer.

Why "clobbering"? According to the values above T0 is neither enabled
nor set to periodic.

> --- a/xen/arch/x86/hpet.c
> +++ b/xen/arch/x86/hpet.c
> @@ -585,16 +585,27 @@ void __init hpet_broadcast_init(void)
> pv_rtc_handler = handle_rtc_once;
> }
>
> + printk(XENLOG_INFO "%s cfg %d\n", __func__, cfg);
> hpet_write32(cfg, HPET_CFG);
>
> for ( i = 0; i < n; i++ )
> {
> - if ( i == 0 && (cfg & HPET_CFG_LEGACY) )
> + printk(XENLOG_INFO "hpet cfg %d legacy %d\n", i, cfg & HPET_CFG_LEGACY);
> + if ( i == 1 && (cfg & HPET_CFG_LEGACY) )
> {
> /* set HPET T0 as oneshot */
> - cfg = hpet_read32(HPET_Tn_CFG(0));
> + cfg = hpet_read32(HPET_Tn_CFG(1));
> cfg &= ~(HPET_TN_LEVEL | HPET_TN_PERIODIC);
> cfg |= HPET_TN_ENABLE | HPET_TN_32BIT;
> + hpet_write32(cfg, HPET_Tn_CFG(1));
> + }
> +
> + if ( i == 0 && (cfg & HPET_CFG_LEGACY) )
> + {
> + /* set HPET T0 as periodic */
> + cfg = hpet_read32(HPET_Tn_CFG(0));
> + cfg |= (HPET_TN_LEVEL | HPET_TN_PERIODIC);

A change like this of course won't be acceptable outside of
your own repo, but I assume you're clear about this.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [BUG] panic: "IO-APIC + timer doesn't work" - several people have reproduced [ In reply to ]
On Wed, Mar 18, 2020 at 6:38 AM Jan Beulich <jbeulich@suse.com> wrote:
>
> On 17.03.2020 16:23, Jason Andryuk wrote:
> > On 17.03.2020 15:08, Jan Beulich wrote:
> >> On 17.03.2020 15:08, Jan Beulich wrote:
> >>> On 17.03.2020 14:48, Jason Andryuk wrote:
> >>>> I got it to boot past "IO-APIC + timer doesn't work". I programmed
> >>>> the HPET to provide a periodic timer in hpet_resume() on T0. When I
> >>>> actually got it programmed properly, it worked to increment
> >>>> pit0_ticks. I also made timer_interrupt() unconditionally
> >>>> pit0_ticks++ though that may not matter.
> >>>
> >>> Hmm, at the first glance I would imply the system gets handed to Xen
> >>> with a HPET state that we don't (and probably also shouldn't) expect.
> >>> Could you provide HPET_CFG as well as all HPET_Tn_CFG and
> >>> HPET_Tn_ROUTE values as hpet_resume() finds them before doing any
> >>> adjustments to them? What are the components / parties involved in
> >>> getting Xen loaded and started?
> >>
> >> Of course much depends on what exactly you mean you've done to
> >> the HPET by saying "I programmed the HPET to provide ...".
> >
> > Below is the diff. It was messier and I tidied it up some.
> >
> > It's mainly the change to hpet_resume() to mimic Linux's legacy HPET
> > setup on T0. It turns on HPET_CFG_LEGACY to ensure the timer interrupt
> > is running. And it also includes the printing of the initial HPET
> > config:
> > HPET_CFG 00000001
> > HPET_T0_CFG 00008030
> > HPET_T0_ROUTE 0000016c
> > HPET_T1_CFG 00008000
> > HPET_T1_ROUTE 00000000
> > HPET_T2_CFG 00008000
> > HPET_T2_ROUTE 00000000
> > HPET_T3_CFG 00008000
> > HPET_T3_ROUTE 00000000
> > HPET_T4_CFG 0000c000
> > HPET_T4_ROUTE 00000000
> > HPET_T5_CFG 0000c000
> > HPET_T5_ROUTE 00000000
> > HPET_T6_CFG 0000c000
> > HPET_T6_ROUTE 00000000
> > HPET_T7_CFG 0000c000
> > HPET_T7_ROUTE 00000000
> >
> > Other changes are to try to prevent Xen from clobbering T0 as a periodic
> > timer.
>
> Why "clobbering"? According to the values above T0 is neither enabled
> nor set to periodic.

I was trying to indicated the changes in hpet_broadcast_init() to
preserve T0 as a periodic timer after it was set up in hpet_resume().

> > --- a/xen/arch/x86/hpet.c
> > +++ b/xen/arch/x86/hpet.c
> > @@ -585,16 +585,27 @@ void __init hpet_broadcast_init(void)
> > pv_rtc_handler = handle_rtc_once;
> > }
> >
> > + printk(XENLOG_INFO "%s cfg %d\n", __func__, cfg);
> > hpet_write32(cfg, HPET_CFG);
> >
> > for ( i = 0; i < n; i++ )
> > {
> > - if ( i == 0 && (cfg & HPET_CFG_LEGACY) )
> > + printk(XENLOG_INFO "hpet cfg %d legacy %d\n", i, cfg & HPET_CFG_LEGACY);
> > + if ( i == 1 && (cfg & HPET_CFG_LEGACY) )
> > {
> > /* set HPET T0 as oneshot */
> > - cfg = hpet_read32(HPET_Tn_CFG(0));
> > + cfg = hpet_read32(HPET_Tn_CFG(1));
> > cfg &= ~(HPET_TN_LEVEL | HPET_TN_PERIODIC);
> > cfg |= HPET_TN_ENABLE | HPET_TN_32BIT;
> > + hpet_write32(cfg, HPET_Tn_CFG(1));
> > + }
> > +
> > + if ( i == 0 && (cfg & HPET_CFG_LEGACY) )
> > + {
> > + /* set HPET T0 as periodic */
> > + cfg = hpet_read32(HPET_Tn_CFG(0));
> > + cfg |= (HPET_TN_LEVEL | HPET_TN_PERIODIC);
>
> A change like this of course won't be acceptable outside of
> your own repo, but I assume you're clear about this.

Of course. I was just providing the example that passes
check_timer(). I'm not familiar with the Xen timer code or HPETs, so
I was hoping this provides useful information to clarify the problem
and find a cleaner solution.

Locally, I minimized the HPET changes to just enable it during
check_timer() and then disable it afterwards. That diff is below.

Xen is still having issues booting dom0 with the HPET changes, so this
change may be incorrect and break something else. Previously, I wrote
about a failed assert in pv_destroy_gdt() during dom0 construction. I
added a printk before the assert, and that issue disappeared and is
also gone after removing it again. Since this is a tablet form
factor, serial output is impossible. I added a delay to printk so I
could more easily capture screen output without it scrolling by. I
have since removed that delay which may have been shifted the problem
as there is now a pagefault in emulate_forced_invalid_op().

r12 is NULL in
testb $0x1,0x4(%r12)
which is:
if ( msrs->misc_features_enables.cpuid_faulting &&

So msrs is NULL? msrs = current->arch.msrs ealier in the function.

The pv_destroy_gdt failed assert was:
ASSERT(v == current || !vcpu_cpu_dirty(v));

I wonder if the timer interrupt could be messing with current somehow?

Thanks,
Jason

diff --git a/xen/arch/x86/hpet.c b/xen/arch/x86/hpet.c
index 86929b9ba1..93a34792b2 100644
--- a/xen/arch/x86/hpet.c
+++ b/xen/arch/x86/hpet.c
@@ -765,6 +765,15 @@ int hpet_legacy_irq_tick(void)

static u32 *hpet_boot_cfg;

+void hpet_disable_legacy(void)
+{
+ u32 cfg = hpet_read32(HPET_CFG);
+ printk(XENLOG_INFO "%s HPET_CFG %08x\n", __func__, cfg);
+ cfg &= ~HPET_CFG_LEGACY;
+ printk(XENLOG_INFO "%s HPET_CFG %08x\n", __func__, cfg);
+ hpet_write32(cfg, HPET_CFG);
+}
+
u64 __init hpet_setup(void)
{
static u64 __initdata hpet_rate;
@@ -804,6 +813,8 @@ u64 __init hpet_setup(void)
return hpet_rate + (last * 2 > hpet_period);
}

+#include <asm/delay.h>
+
void hpet_resume(u32 *boot_cfg)
{
static u32 system_reset_latch;
@@ -842,11 +853,33 @@ void hpet_resume(u32 *boot_cfg)
cfg & HPET_TN_RESERVED, i);
cfg &= ~HPET_TN_RESERVED;
}
+ if (i == 0) {
+ cfg |= HPET_TN_ENABLE | HPET_TN_PERIODIC | HPET_TN_SETVAL |
+ HPET_TN_32BIT;
+ }
hpet_write32(cfg, HPET_Tn_CFG(i));
+ if (i == 0) {
+#define NSEC_PER_SEC 1000000000L
+ uint64_t delta;
+ unsigned int now;
+ unsigned int cmp;
+ u64 hpet_rate = hpet_setup();
+ uint32_t mult = div_sc((unsigned long)hpet_rate,
+ 1000000000ul, 32);
+ uint32_t shift = 32;
+ printk(XENLOG_INFO "hpet mult %d shift %d\n", mult, shift);
+ delta = ((uint64_t)(NSEC_PER_SEC / HZ)) * mult;
+ delta >>= shift;
+ now = hpet_read32(HPET_COUNTER);
+ cmp = now + (unsigned int)delta;
+ hpet_write32(cmp, HPET_Tn_CMP(i));
+ udelay(1);
+ hpet_write32(delta, HPET_Tn_CMP(i));
+ }
}

cfg = hpet_read32(HPET_CFG);
- cfg |= HPET_CFG_ENABLE;
+ cfg |= HPET_CFG_ENABLE | HPET_CFG_LEGACY;
hpet_write32(cfg, HPET_CFG);
}

diff --git a/xen/arch/x86/io_apic.c b/xen/arch/x86/io_apic.c
index e98e08e9c8..b62dea190a 100644
--- a/xen/arch/x86/io_apic.c
+++ b/xen/arch/x86/io_apic.c
@@ -34,6 +34,7 @@
#include <asm/desc.h>
#include <asm/msi.h>
#include <asm/setup.h>
+#include <asm/hpet.h>
#include <mach_apic.h>
#include <io_ports.h>
#include <irq_vectors.h>
@@ -2047,6 +2048,7 @@ void __init setup_IO_APIC(void)
setup_IO_APIC_irqs();
init_IO_APIC_traps();
check_timer();
+ hpet_disable_legacy();
print_IO_APIC();
ioapic_pm_state_alloc();

diff --git a/xen/include/asm-x86/hpet.h b/xen/include/asm-x86/hpet.h
index fb6bf05065..531e94e904 100644
--- a/xen/include/asm-x86/hpet.h
+++ b/xen/include/asm-x86/hpet.h
@@ -82,6 +82,7 @@ void hpet_broadcast_enter(void);
void hpet_broadcast_exit(void);
int hpet_broadcast_is_available(void);
void hpet_disable_legacy_broadcast(void);
+void hpet_disable_legacy(void);

extern void (*pv_rtc_handler)(uint8_t reg, uint8_t value);

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [BUG] panic: "IO-APIC + timer doesn't work" - several people have reproduced [ In reply to ]
On Wed, Mar 18, 2020 at 10:04 AM Jason Andryuk <jandryuk@gmail.com> wrote:
> On Wed, Mar 18, 2020 at 6:38 AM Jan Beulich <jbeulich@suse.com> wrote:
> > On 17.03.2020 16:23, Jason Andryuk wrote:
> > > On 17.03.2020 15:08, Jan Beulich wrote:
> > >> On 17.03.2020 15:08, Jan Beulich wrote:
> > >>> On 17.03.2020 14:48, Jason Andryuk wrote:
> > >>>> I got it to boot past "IO-APIC + timer doesn't work". I programmed
> > >>>> the HPET to provide a periodic timer in hpet_resume() on T0. When I
> > >>>> actually got it programmed properly, it worked to increment
> > >>>> pit0_ticks. I also made timer_interrupt() unconditionally
> > >>>> pit0_ticks++ though that may not matter.
> > >>>
> > >>> Hmm, at the first glance I would imply the system gets handed to Xen
> > >>> with a HPET state that we don't (and probably also shouldn't) expect.
> > >>> Could you provide HPET_CFG as well as all HPET_Tn_CFG and
> > >>> HPET_Tn_ROUTE values as hpet_resume() finds them before doing any
> > >>> adjustments to them? What are the components / parties involved in
> > >>> getting Xen loaded and started?
> > >>
> > >> Of course much depends on what exactly you mean you've done to
> > >> the HPET by saying "I programmed the HPET to provide ...".
> > >
> > > Below is the diff. It was messier and I tidied it up some.
> > >
> > > It's mainly the change to hpet_resume() to mimic Linux's legacy HPET
> > > setup on T0. It turns on HPET_CFG_LEGACY to ensure the timer interrupt
> > > is running. And it also includes the printing of the initial HPET
> > > config:
> > > HPET_CFG 00000001
> > > HPET_T0_CFG 00008030
> > > HPET_T0_ROUTE 0000016c
> > > HPET_T1_CFG 00008000
> > > HPET_T1_ROUTE 00000000
> > > HPET_T2_CFG 00008000
> > > HPET_T2_ROUTE 00000000
> > > HPET_T3_CFG 00008000
> > > HPET_T3_ROUTE 00000000
> > > HPET_T4_CFG 0000c000
> > > HPET_T4_ROUTE 00000000
> > > HPET_T5_CFG 0000c000
> > > HPET_T5_ROUTE 00000000
> > > HPET_T6_CFG 0000c000
> > > HPET_T6_ROUTE 00000000
> > > HPET_T7_CFG 0000c000
> > > HPET_T7_ROUTE 00000000
> > >
> > > Other changes are to try to prevent Xen from clobbering T0 as a periodic
> > > timer.
> >
> > Why "clobbering"? According to the values above T0 is neither enabled
> > nor set to periodic.
>
> I was trying to indicated the changes in hpet_broadcast_init() to
> preserve T0 as a periodic timer after it was set up in hpet_resume().
>
> > > --- a/xen/arch/x86/hpet.c
> > > +++ b/xen/arch/x86/hpet.c
> > > @@ -585,16 +585,27 @@ void __init hpet_broadcast_init(void)
> > > pv_rtc_handler = handle_rtc_once;
> > > }
> > >
> > > + printk(XENLOG_INFO "%s cfg %d\n", __func__, cfg);
> > > hpet_write32(cfg, HPET_CFG);
> > >
> > > for ( i = 0; i < n; i++ )
> > > {
> > > - if ( i == 0 && (cfg & HPET_CFG_LEGACY) )
> > > + printk(XENLOG_INFO "hpet cfg %d legacy %d\n", i, cfg & HPET_CFG_LEGACY);
> > > + if ( i == 1 && (cfg & HPET_CFG_LEGACY) )
> > > {
> > > /* set HPET T0 as oneshot */
> > > - cfg = hpet_read32(HPET_Tn_CFG(0));
> > > + cfg = hpet_read32(HPET_Tn_CFG(1));
> > > cfg &= ~(HPET_TN_LEVEL | HPET_TN_PERIODIC);
> > > cfg |= HPET_TN_ENABLE | HPET_TN_32BIT;
> > > + hpet_write32(cfg, HPET_Tn_CFG(1));
> > > + }
> > > +
> > > + if ( i == 0 && (cfg & HPET_CFG_LEGACY) )
> > > + {
> > > + /* set HPET T0 as periodic */
> > > + cfg = hpet_read32(HPET_Tn_CFG(0));
> > > + cfg |= (HPET_TN_LEVEL | HPET_TN_PERIODIC);
> >
> > A change like this of course won't be acceptable outside of
> > your own repo, but I assume you're clear about this.
>
> Of course. I was just providing the example that passes
> check_timer(). I'm not familiar with the Xen timer code or HPETs, so
> I was hoping this provides useful information to clarify the problem
> and find a cleaner solution.
>
> Locally, I minimized the HPET changes to just enable it during
> check_timer() and then disable it afterwards. That diff is below.
>
> Xen is still having issues booting dom0 with the HPET changes, so this
> change may be incorrect and break something else. Previously, I wrote
> about a failed assert in pv_destroy_gdt() during dom0 construction. I
> added a printk before the assert, and that issue disappeared and is
> also gone after removing it again. Since this is a tablet form
> factor, serial output is impossible. I added a delay to printk so I
> could more easily capture screen output without it scrolling by. I
> have since removed that delay which may have been shifted the problem
> as there is now a pagefault in emulate_forced_invalid_op().
>
> r12 is NULL in
> testb $0x1,0x4(%r12)
> which is:
> if ( msrs->misc_features_enables.cpuid_faulting &&
>
> So msrs is NULL? msrs = current->arch.msrs ealier in the function.
>
> The pv_destroy_gdt failed assert was:
> ASSERT(v == current || !vcpu_cpu_dirty(v));
>
> I wonder if the timer interrupt could be messing with current somehow?
>

Something was stale in my build tree. After cleaning and re-build
Xen, it boots into dom0 with the patch below.

Regards,
Jason

> diff --git a/xen/arch/x86/hpet.c b/xen/arch/x86/hpet.c
> index 86929b9ba1..93a34792b2 100644
> --- a/xen/arch/x86/hpet.c
> +++ b/xen/arch/x86/hpet.c
> @@ -765,6 +765,15 @@ int hpet_legacy_irq_tick(void)
>
> static u32 *hpet_boot_cfg;
>
> +void hpet_disable_legacy(void)
> +{
> + u32 cfg = hpet_read32(HPET_CFG);
> + printk(XENLOG_INFO "%s HPET_CFG %08x\n", __func__, cfg);
> + cfg &= ~HPET_CFG_LEGACY;
> + printk(XENLOG_INFO "%s HPET_CFG %08x\n", __func__, cfg);
> + hpet_write32(cfg, HPET_CFG);
> +}
> +
> u64 __init hpet_setup(void)
> {
> static u64 __initdata hpet_rate;
> @@ -804,6 +813,8 @@ u64 __init hpet_setup(void)
> return hpet_rate + (last * 2 > hpet_period);
> }
>
> +#include <asm/delay.h>
> +
> void hpet_resume(u32 *boot_cfg)
> {
> static u32 system_reset_latch;
> @@ -842,11 +853,33 @@ void hpet_resume(u32 *boot_cfg)
> cfg & HPET_TN_RESERVED, i);
> cfg &= ~HPET_TN_RESERVED;
> }
> + if (i == 0) {
> + cfg |= HPET_TN_ENABLE | HPET_TN_PERIODIC | HPET_TN_SETVAL |
> + HPET_TN_32BIT;
> + }
> hpet_write32(cfg, HPET_Tn_CFG(i));
> + if (i == 0) {
> +#define NSEC_PER_SEC 1000000000L
> + uint64_t delta;
> + unsigned int now;
> + unsigned int cmp;
> + u64 hpet_rate = hpet_setup();
> + uint32_t mult = div_sc((unsigned long)hpet_rate,
> + 1000000000ul, 32);
> + uint32_t shift = 32;
> + printk(XENLOG_INFO "hpet mult %d shift %d\n", mult, shift);
> + delta = ((uint64_t)(NSEC_PER_SEC / HZ)) * mult;
> + delta >>= shift;
> + now = hpet_read32(HPET_COUNTER);
> + cmp = now + (unsigned int)delta;
> + hpet_write32(cmp, HPET_Tn_CMP(i));
> + udelay(1);
> + hpet_write32(delta, HPET_Tn_CMP(i));
> + }
> }
>
> cfg = hpet_read32(HPET_CFG);
> - cfg |= HPET_CFG_ENABLE;
> + cfg |= HPET_CFG_ENABLE | HPET_CFG_LEGACY;
> hpet_write32(cfg, HPET_CFG);
> }
>
> diff --git a/xen/arch/x86/io_apic.c b/xen/arch/x86/io_apic.c
> index e98e08e9c8..b62dea190a 100644
> --- a/xen/arch/x86/io_apic.c
> +++ b/xen/arch/x86/io_apic.c
> @@ -34,6 +34,7 @@
> #include <asm/desc.h>
> #include <asm/msi.h>
> #include <asm/setup.h>
> +#include <asm/hpet.h>
> #include <mach_apic.h>
> #include <io_ports.h>
> #include <irq_vectors.h>
> @@ -2047,6 +2048,7 @@ void __init setup_IO_APIC(void)
> setup_IO_APIC_irqs();
> init_IO_APIC_traps();
> check_timer();
> + hpet_disable_legacy();
> print_IO_APIC();
> ioapic_pm_state_alloc();
>
> diff --git a/xen/include/asm-x86/hpet.h b/xen/include/asm-x86/hpet.h
> index fb6bf05065..531e94e904 100644
> --- a/xen/include/asm-x86/hpet.h
> +++ b/xen/include/asm-x86/hpet.h
> @@ -82,6 +82,7 @@ void hpet_broadcast_enter(void);
> void hpet_broadcast_exit(void);
> int hpet_broadcast_is_available(void);
> void hpet_disable_legacy_broadcast(void);
> +void hpet_disable_legacy(void);
>
> extern void (*pv_rtc_handler)(uint8_t reg, uint8_t value);

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [BUG] panic: "IO-APIC + timer doesn't work" - several people have reproduced [ In reply to ]
On 17.03.2020 16:23, Jason Andryuk wrote:
> Below is the diff. It was messier and I tidied it up some.

I've looked into this some more. I can see how what we currently
do is not in line with firmware handing off with LegacyReplacement
mode enabled. However, this case doesn't look to apply here:

> It's mainly the change to hpet_resume() to mimic Linux's legacy HPET
> setup on T0. It turns on HPET_CFG_LEGACY to ensure the timer interrupt
> is running. And it also includes the printing of the initial HPET
> config:
> HPET_CFG 00000001

While HPET_CFG_ENABLE is set, HPET_CFG_LEGACY is clear.

> HPET_T0_CFG 00008030
> HPET_T0_ROUTE 0000016c

And while firmware must have setup FSB routing for T0, its enable
bits (both HPET_TN_ENABLE and HPET_TN_FSB) are also clear.
Therefore we have, afaics, no indication whatsoever that we ought
to enable LegacyReplacement mode. Of course the spec also says
"Assuming platform does not have 8254/RTC hardware or does not
want to support this legacy timer hardware, for this case, System
BIOS should set the LegacyReplacement Route bit and report IRQ0 &
IRQ8 as being consumed by the HPET block in system name space:"
(followed by a table). What is referred to as "system name space"
is, I assume, ACPI DSDT/SSDTs, which we have no access to this
early (and Linux doesn't either, aiui), so also can't be used as
indicator.

Otoh I also don't think it is correct to blindly enable
LegacyReplacement mode, like - afaics - Linux does, the more with
our split brain model as far as affected devices go (Xen "owns"
the PIT [and of course also the HPET], while Linux "owns" the
RTC). This is because of the effect of this setting on what
actually drives IRQ8. In theory we might be able do so when
ACPI_FADT_NO_CMOS_RTC is set, but Linux may use the CMOS RTC
even when that flag is set.

So right now the only possible approach I see to address your
problem is to add yet another fallback mode to check_timer(),
forcing LegacyReplacement mode to be enabled. But between /
after which step(s) to put this there isn't at all obvious to me.

Jan

1 2  View All