Mailing List Archive

PROBLEM: Intel I210AT NIC resets while using PCI passthrough on ESXi (regression)
[1.] One line summary of the problem:

Intel I210AT NIC resets while using PCI passthrough on ESXi (regression)


[2.] Full description of the problem/report:

Intel I210AT NIC running on the latest (4.7-rc7) kernel revision suffers a "transmit queue 0 timed out" issue, which makes the NIC unusable. Each time this happens, the adapter is reset. To trigger the bug, bring up the adapter and attempt to transmit something. In my case it was enough to configure IPv4 on it and ping something. The more data is attempted to be transmitted, the more often the abovementioned error occurs.

The system runs on a VMware ESXi 6.0 hypervisor and the NIC is attached to the Linux VM via PCI passthrough. BIOS and ESXi are all latest stable versions as of June 23rd, 2016.

A bisect was performed, which pointed to commit 52f518a3a7c2f80551a38d38be28bc9f335e713c (x86/MSI: Use hierarchical irqdomains to manage MSI interrupts) being the first bad one. Full bisect log is below in part [9.].

It was noted that sometimes the bug does not reproduce the first time any given kernel version is booted, however subsequent boots reproduce the issue.

The following appears in the kernel log when the bug occurs (also appended relevant information, MAC addresses edited out).

[ 2.099565] igb: Intel(R) Gigabit Ethernet Network Driver - version 5.3.0-k
[ 2.100488] igb: Copyright (c) 2007-2014 Intel Corporation.
[ 2.106832] igb 0000:1b:00.0: enabling device (0000 -> 0002)
[ 2.146246] igb 0000:1b:00.0: added PHC on eth1
[ 2.147211] igb 0000:1b:00.0: Intel(R) Gigabit Ethernet Network Connection
[ 2.148468] igb 0000:1b:00.0: eth1: (PCIe:5.0Gb/s:unknown) xx:xx:xx:xx:xx:xx
[ 2.149846] igb 0000:1b:00.0: eth1: PBA No: 001300-000
[ 2.150900] igb 0000:1b:00.0: Using MSI-X interrupts. 1 rx queue(s), 1 tx queue(s)
[ 2.502960] igb 0000:1b:00.0 eth-wan2: renamed from eth1
[ 15.836869] igb 0000:1b:00.0 eth-wan2: igb: eth-wan2 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
[ 20.848027] ------------[ cut here ]------------
[ 20.848033] WARNING: CPU: 0 PID: 0 at /home/kernel/COD/linux/net/sched/sch_generic.c:272 dev_watchdog+0x22d/0x230
[ 20.848034] NETDEV WATCHDOG: eth-wan2 (igb): transmit queue 0 timed out
[ 20.848035] Modules linked in: vmw_vsock_vmci_transport vsock fuse intel_powerclamp coretemp intel_rapl_perf ppdev vmw_balloon joydev nls_utf8 nls_cp437 vfat fat efi_pstore efivars serio_raw sg battery parport_pc parport shpchp i2c_piix4 vmw_vmci acpi_cpufreq ac tpm_tis tpm evdev sch_fq_codel ib_iser rdma_cm iw_cm ib_cm ib_core configfs iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi efivarfs autofs4 ext4 crc16 jbd2 mbcache btrfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c crc32c_generic raid1 raid0 multipath linear md_mod hid_generic usbhid hid sr_mod cdrom crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel jitterentropy_rng sd_mod hmac drbg ansi_cprng aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd psmouse
[ 20.848061] igb dca ptp pps_core i2c_algo_bit ata_generic vmxnet3 uhci_hcd ehci_pci ehci_hcd vmwgfx vmw_pvscsi ata_piix ahci libahci usbcore usb_common ttm libata drm_kms_helper scsi_mod drm fjes button
[ 20.848069] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.7.0-040700rc7-generic #201607110032
[ 20.848070] Hardware name: VMware, Inc. VMware7,1/440BX Desktop Reference Platform, BIOS VMW71.00V.0.B64.1506250318 06/25/2015
[ 20.848071] 0000000000000286 223eaef37d48856a ffffffff81321265 ffff88003c603e10
[ 20.848072] 0000000000000000 ffffffff81078a3e 0000000000000000 ffff88003c603e68
[ 20.848073] ffff88003434c000 0000000000000000 ffff880034cf1940 0000000000000008
[ 20.848075] Call Trace:
[ 20.848076] <IRQ> [<ffffffff81321265>] ? dump_stack+0x5c/0x77
[ 20.848082] [<ffffffff81078a3e>] ? __warn+0xbe/0xe0
[ 20.848084] [<ffffffff81078abf>] ? warn_slowpath_fmt+0x5f/0x80
[ 20.848086] [<ffffffff8150cc9d>] ? dev_watchdog+0x22d/0x230
[ 20.848087] [<ffffffff8150ca70>] ? qdisc_rcu_free+0x40/0x40
[ 20.848089] [<ffffffff810e0680>] ? call_timer_fn+0x30/0x120
[ 20.848090] [<ffffffff8150ca70>] ? qdisc_rcu_free+0x40/0x40
[ 20.848091] [<ffffffff810e0ffc>] ? run_timer_softirq+0x23c/0x2f0
[ 20.848094] [<ffffffff815ee5e8>] ? __do_softirq+0xf8/0x28e
[ 20.848095] [<ffffffff8107e32b>] ? irq_exit+0x9b/0xa0
[ 20.848096] [<ffffffff815ee3fe>] ? smp_apic_timer_interrupt+0x3e/0x50
[ 20.848097] [<ffffffff815ec722>] ? apic_timer_interrupt+0x82/0x90
[ 20.848098] <EOI> [<ffffffff8105bd72>] ? native_safe_halt+0x2/0x10
[ 20.848102] [<ffffffff810354d8>] ? default_idle+0x18/0xc0
[ 20.848103] [<ffffffff810b9429>] ? cpu_startup_entry+0x2d9/0x330
[ 20.848106] [<ffffffff81b2df73>] ? start_kernel+0x44f/0x46f
[ 20.848108] [<ffffffff81b2d120>] ? early_idt_handler_array+0x120/0x120
[ 20.848109] [<ffffffff81b2d41c>] ? x86_64_start_kernel+0x14f/0x173
[ 20.848110] ---[ end trace 41fb9507befdccac ]---
[ 20.848125] igb 0000:1b:00.0 eth-wan2: Reset adapter
[ 24.836669] igb 0000:1b:00.0 eth-wan2: igb: eth-wan2 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX


[4.] Kernel information
[4.1.] Kernel version (from /proc/version):

Linux version 4.7.0-040700rc7-generic (kernel@gloin) (gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1) ) #201607110032 SMP Mon Jul 11 04:34:25 UTC 2016


[5.] Most recent kernel that did not have the bug:

Version 4.1 in mainline was the most recent kernel version without the bug.


[8.] Environment
[8.5.] PCI information ('lspci -vvv' as root)

1b:00.0 Ethernet controller: Intel Corporation I210 Gigabit Network Connection (rev 03)
DeviceName: pciPassthru1
Subsystem: ASUSTeK Computer Inc. I210 Gigabit Network Connection
Physical Slot: 256
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 64, Cache Line Size: 64 bytes
Interrupt: pin A routed to IRQ 17
Region 0: Memory at fd100000 (32-bit, non-prefetchable) [size=512K]
Region 2: I/O ports at 3000 [disabled] [size=32]
Region 3: Memory at fd180000 (32-bit, non-prefetchable) [size=16K]
Capabilities: [40] Power Management version 3
Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=1 PME-
Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+
Address: 0000000000000000 Data: 0000
Masking: 00000000 Pending: 00000000
Capabilities: [70] MSI-X: Enable+ Count=5 Masked-
Vector table: BAR=3 offset=00000000
PBA: BAR=3 offset=00002000
Capabilities: [a0] Express (v2) Endpoint, MSI 00
DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us
ExtTag- AttnBtn- AttnInd- PwrInd- RBE- FLReset-
DevCtl: Report errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+
RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
MaxPayload 128 bytes, MaxReadReq 128 bytes
DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
LnkCap: Port #0, Speed 5GT/s, Width x32, ASPM L0s, Exit Latency L0s <64ns, L1 <1us
ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp-
LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk-
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 5GT/s, Width x32, TrErr- Train- SlotClk- DLActive- BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Not Supported, TimeoutDis-, LTR-, OBFF Not Supported
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
Compliance De-emphasis: -6dB
LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
Capabilities: [100 v2] Advanced Error Reporting
UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq+ ACSViol-
UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
AERCap: First Error Pointer: 14, GenCap+ CGenEn- ChkCap+ ChkEn-
Capabilities: [140 v1] Device Serial Number 60-a4-4c-ff-ff-e9-0b-a9
Capabilities: [1a0 v1] Transaction Processing Hints
Device specific mode supported
Steering table in TPH capability structure
Kernel driver in use: igb
Kernel modules: igb


[8.7.] /proc/interrupts

CPU0
0: 19 IO-APIC 2-edge timer
1: 9 IO-APIC 1-edge i8042
4: 47798 IO-APIC 4-edge serial
8: 1 IO-APIC 8-edge rtc0
9: 0 IO-APIC 9-fasteoi acpi
12: 15 IO-APIC 12-edge i8042
14: 0 IO-APIC 14-edge ata_piix
15: 0 IO-APIC 15-edge ata_piix
16: 65 IO-APIC 16-fasteoi vmwgfx, uhci_hcd:usb2
17: 0 IO-APIC 17-fasteoi ehci_hcd:usb1
24: 0 PCI-MSI 344064-edge PCIe PME, pciehp
25: 0 PCI-MSI 346112-edge PCIe PME, pciehp
26: 0 PCI-MSI 348160-edge PCIe PME, pciehp
27: 0 PCI-MSI 350208-edge PCIe PME, pciehp
28: 0 PCI-MSI 352256-edge PCIe PME, pciehp
29: 0 PCI-MSI 354304-edge PCIe PME, pciehp
30: 0 PCI-MSI 356352-edge PCIe PME, pciehp
31: 0 PCI-MSI 358400-edge PCIe PME, pciehp
32: 0 PCI-MSI 360448-edge PCIe PME, pciehp
33: 0 PCI-MSI 362496-edge PCIe PME, pciehp
34: 0 PCI-MSI 364544-edge PCIe PME, pciehp
35: 0 PCI-MSI 366592-edge PCIe PME, pciehp
36: 0 PCI-MSI 368640-edge PCIe PME, pciehp
37: 0 PCI-MSI 370688-edge PCIe PME, pciehp
38: 0 PCI-MSI 372736-edge PCIe PME, pciehp
39: 0 PCI-MSI 374784-edge PCIe PME, pciehp
40: 0 PCI-MSI 376832-edge PCIe PME, pciehp
41: 0 PCI-MSI 378880-edge PCIe PME, pciehp
42: 0 PCI-MSI 380928-edge PCIe PME, pciehp
43: 0 PCI-MSI 382976-edge PCIe PME, pciehp
44: 0 PCI-MSI 385024-edge PCIe PME, pciehp
45: 0 PCI-MSI 387072-edge PCIe PME, pciehp
46: 0 PCI-MSI 389120-edge PCIe PME, pciehp
47: 0 PCI-MSI 391168-edge PCIe PME, pciehp
48: 0 PCI-MSI 393216-edge PCIe PME, pciehp
49: 0 PCI-MSI 395264-edge PCIe PME, pciehp
50: 0 PCI-MSI 397312-edge PCIe PME, pciehp
51: 0 PCI-MSI 399360-edge PCIe PME, pciehp
52: 0 PCI-MSI 401408-edge PCIe PME, pciehp
53: 0 PCI-MSI 403456-edge PCIe PME, pciehp
54: 0 PCI-MSI 405504-edge PCIe PME, pciehp
55: 0 PCI-MSI 407552-edge PCIe PME, pciehp
56: 584 PCI-MSI 1064960-edge ahci[0000:02:01.0]
57: 4293 PCI-MSI 1572864-edge vmw_pvscsi
60: 0 PCI-MSI 14155776-edge eth-wan2
61: 0 PCI-MSI 14155777-edge eth-wan2-rx-0
62: 0 PCI-MSI 14155778-edge eth-wan2-tx-0
63: 154 PCI-MSI 129024-edge vmw_vmci
64: 0 PCI-MSI 129025-edge vmw_vmci
NMI: 0 Non-maskable interrupts
LOC: 52617 Local timer interrupts
SPU: 0 Spurious interrupts
PMI: 0 Performance monitoring interrupts
IWI: 0 IRQ work interrupts
RTR: 0 APIC ICR read retries
RES: 0 Rescheduling interrupts
CAL: 0 Function call interrupts
TLB: 0 TLB shootdowns
TRM: 0 Thermal event interrupts
THR: 0 Threshold APIC interrupts
DFR: 0 Deferred Error APIC interrupts
MCE: 0 Machine check exceptions
MCP: 4 Machine check polls
ERR: 0
MIS: 0
PIN: 0 Posted-interrupt notification event
PIW: 0 Posted-interrupt wakeup event



[9.] Full git bisect log

git bisect start
# good: [b953c0d234bc72e8489d3bf51a276c5c4ec85345] Linux 4.1
git bisect good b953c0d234bc72e8489d3bf51a276c5c4ec85345
# bad: [d770e558e21961ad6cfdf0ff7df0eb5d7d4f0754] Linux 4.2-rc1
git bisect bad d770e558e21961ad6cfdf0ff7df0eb5d7d4f0754
# bad: [4570a37169d4b44d316f40b2ccc681dc93fedc7b] Merge tag 'sound-4.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
git bisect bad 4570a37169d4b44d316f40b2ccc681dc93fedc7b
# bad: [4e241557fc1cb560bd9e77ca1b4a9352732a5427] Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
git bisect bad 4e241557fc1cb560bd9e77ca1b4a9352732a5427
# bad: [44d21c3f3a2ef2f58b18bda64c52c99e723f3f4a] Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
git bisect bad 44d21c3f3a2ef2f58b18bda64c52c99e723f3f4a
# good: [e75c73ad64478c12b3a44b86a3e7f62a4f65b93e] Merge branch 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
git bisect good e75c73ad64478c12b3a44b86a3e7f62a4f65b93e
# bad: [43224b96af3154cedd7220f7b90094905f07ac78] Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
git bisect bad 43224b96af3154cedd7220f7b90094905f07ac78
# bad: [9dda1658a9bd450d65da5153a2427955785d17c2] Merge branch 'x86/asm' into x86/core, to prepare for new patch
git bisect bad 9dda1658a9bd450d65da5153a2427955785d17c2
# bad: [191a66353b22fad8ac89404ab4c929cbe7b0afb2] Merge branch 'x86/asm' into x86/apic, to resolve a conflict
git bisect bad 191a66353b22fad8ac89404ab4c929cbe7b0afb2
# bad: [099c5c03487f6bca30c628e14e666788dd61fb33] irq_remapping/vt-d: Move struct irq_2_iommu into intel_irq_remapping.c
git bisect bad 099c5c03487f6bca30c628e14e666788dd61fb33
# bad: [90d84fe95dd6b418383aa0e0e5cace8f1b1e7e30] x86/MSI: Replace msi_update_msg() with irq_chip_compose_msi_msg()
git bisect bad 90d84fe95dd6b418383aa0e0e5cace8f1b1e7e30
# good: [3cb96f0c97330834929abe9bd2ca3c252a83def0] x86/hpet: Enhance HPET IRQ to support hierarchical irqdomains
git bisect good 3cb96f0c97330834929abe9bd2ca3c252a83def0
# bad: [b1855c752e67d1125d41fadb499014b49a245db8] x86/MSI: Clean up unused MSI related code and interfaces
git bisect bad b1855c752e67d1125d41fadb499014b49a245db8
# bad: [2b43817e481da9f5118adb56aef46b3f0298c685] irq_remapping/vt-d: Clean up unused MSI related code
git bisect bad 2b43817e481da9f5118adb56aef46b3f0298c685
# bad: [80aa283364a17998dceb577bd185e3380b927544] x86/irq: Directly call native_compose_msi_msg() for DMAR IRQ
git bisect bad 80aa283364a17998dceb577bd185e3380b927544
# bad: [52f518a3a7c2f80551a38d38be28bc9f335e713c] x86/MSI: Use hierarchical irqdomains to manage MSI interrupts
git bisect bad 52f518a3a7c2f80551a38d38be28bc9f335e713c
# first bad commit: [52f518a3a7c2f80551a38d38be28bc9f335e713c] x86/MSI: Use hierarchical irqdomains to manage MSI interrupts
Re: PROBLEM: Intel I210AT NIC resets while using PCI passthrough on ESXi (regression) [ In reply to ]
On Sat, 23 Jul 2016, Foster Snowhill wrote:
> [1.] One line summary of the problem:
>
> Intel I210AT NIC resets while using PCI passthrough on ESXi (regression)

That has been reported before and so far this has been believed to be a vmware
issue. Native hardware and KVM do not expose that. See:

http://marc.info/?l=linux-kernel&m=145280623530135&w=2
http://marc.info/?l=linux-kernel&m=145879968005421&w=2

Could you please give the patch below a try? It might be related, but I'm not
sure whether it will cure that particular vmware oddity.

Thanks,

tglx

8<-----------------------

From: Marc Zyngier <marc.zyngier@arm.com>
genirq/msi: Make sure PCI MSIs are activated early

Bharat Kumar Gogada reported issues with the generic MSI code,
where the end-point ended up with garbage in its MSI configuration
(both for the vector and the message).

It turns out that the two MSI paths in the kernel are doing slightly
different things:

generic MSI: disable MSI -> allocate MSI -> enable MSI -> setup EP
PCI MSI: disable MSI -> allocate MSI -> setup EP -> enable MSI

and it turns out that end-points are allowed to latch the content
of the MSI configuration registers as soon as MSIs are enabled.
In Bharat's case, the end-point ends up using whatever was there
already, which is not what you want.

In order to make things converge, we introduce a new MSI domain
flag (MSI_FLAG_ACTIVATE_EARLY) that is unconditionally set for
PCI/MSI. When set, this flag forces the programming of the end-point
as soon as the MSIs are allocated.

A consequence of this is that we have an extra activate in
irq_startup, but that should be without much consequence.

Reported-by: Bharat Kumar Gogada <bharat.kumar.gogada@xilinx.com>
Tested-by: Bharat Kumar Gogada <bharat.kumar.gogada@xilinx.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
drivers/pci/msi.c | 2 ++
include/linux/msi.h | 2 ++
kernel/irq/msi.c | 7 +++++++
3 files changed, 11 insertions(+)

diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
index a080f44..565e2a4 100644
--- a/drivers/pci/msi.c
+++ b/drivers/pci/msi.c
@@ -1277,6 +1277,8 @@ struct irq_domain *pci_msi_create_irq_domain(struct fwnode_handle *fwnode,
if (info->flags & MSI_FLAG_USE_DEF_CHIP_OPS)
pci_msi_domain_update_chip_ops(info);

+ info->flags |= MSI_FLAG_ACTIVATE_EARLY;
+
domain = msi_create_irq_domain(fwnode, info, parent);
if (!domain)
return NULL;
diff --git a/include/linux/msi.h b/include/linux/msi.h
index 8b425c6..513b7c7 100644
--- a/include/linux/msi.h
+++ b/include/linux/msi.h
@@ -270,6 +270,8 @@ enum {
MSI_FLAG_MULTI_PCI_MSI = (1 << 3),
/* Support PCI MSIX interrupts */
MSI_FLAG_PCI_MSIX = (1 << 4),
+ /* Needs early activate, required for PCI */
+ MSI_FLAG_ACTIVATE_EARLY = (1 << 5),
};

int msi_domain_set_affinity(struct irq_data *data, const struct cpumask *mask,
diff --git a/kernel/irq/msi.c b/kernel/irq/msi.c
index 38e89ce..4ed2cca 100644
--- a/kernel/irq/msi.c
+++ b/kernel/irq/msi.c
@@ -361,6 +361,13 @@ int msi_domain_alloc_irqs(struct irq_domain *domain, struct device *dev,
else
dev_dbg(dev, "irq [%d-%d] for MSI\n",
virq, virq + desc->nvec_used - 1);
+
+ if (info->flags & MSI_FLAG_ACTIVATE_EARLY) {
+ struct irq_data *irq_data;
+
+ irq_data = irq_domain_get_irq_data(domain, desc->irq);
+ irq_domain_activate_irq(irq_data);
+ }
}

return 0;
--
2.1.4
Re: PROBLEM: Intel I210AT NIC resets while using PCI passthrough on ESXi (regression) [ In reply to ]
On 25.07.16 13:56, Thomas Gleixner wrote:
> On Sat, 23 Jul 2016, Foster Snowhill wrote:
>> [1.] One line summary of the problem:
>>
>> Intel I210AT NIC resets while using PCI passthrough on ESXi (regression)
>
> That has been reported before and so far this has been believed to be a vmware
> issue. Native hardware and KVM do not expose that. See:
>
> http://marc.info/?l=linux-kernel&m=145280623530135&w=2
> http://marc.info/?l=linux-kernel&m=145879968005421&w=2

This is indeed the same symptoms I encountered in my case. More than that, I have an LSI 2008 adapter which is also being passed through in the same manner as the NIC, absolutely the same problem there too. Before doing the bisect I considered it a separate issue and planned to work on it later.

> Could you please give the patch below a try? It might be related, but I'm not
> sure whether it will cure that particular vmware oddity.

Patch fixed the issue for me. Did 4-5 reboots just to be sure, all working properly, no adapter resets. Just for the sake of experiment built a clean 4.7 kernel without the patch, issue still present there as expected.
Re: PROBLEM: Intel I210AT NIC resets while using PCI passthrough on ESXi (regression) [ In reply to ]
On Mon, 25 Jul 2016, Foster Snowhill wrote:
> On 25.07.16 13:56, Thomas Gleixner wrote:
> > not sure whether it will cure that particular vmware oddity.
>
> Patch fixed the issue for me. Did 4-5 reboots just to be sure, all working
> properly, no adapter resets. Just for the sake of experiment built a clean
> 4.7 kernel without the patch, issue still present there as expected.

Can you please try whether the replacement patch below fixes your issue as
well?

Thanks,

tglx

8<------------
diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
index a080f4496fe2..142341f8331b 100644
--- a/drivers/pci/msi.c
+++ b/drivers/pci/msi.c
@@ -645,6 +645,15 @@ static int msi_capability_init(struct pci_dev *dev, int nvec)
return ret;
}

+ /*
+ * The mask can be ignored and PCI 2.3 does not specify mask bits for
+ * each MSI interrupt. So in case of hierarchical irqdomains we need
+ * to make sure that if masking is not available that the msi message
+ * is written prior to setting the MSI enable bit in the device.
+ */
+ if (pci_msi_ignore_mask || !entry->msi_attrib.maskbit)
+ irq_domain_activate_irq(irq_get_irq_data(entry->irq));
+
/* Set MSI enabled bits */
pci_intx_for_msi(dev, 0);
pci_msi_set_enable(dev, 1);
Re: PROBLEM: Intel I210AT NIC resets while using PCI passthrough on ESXi (regression) [ In reply to ]
On 26.07.16 14:46, Thomas Gleixner wrote:
> Can you please try whether the replacement patch below fixes your issue as
> well?

This one doesn't fix the issue, getting resets again. Patch applied to HEAD (commit e65805251f2db69c9f67ed8062ab82526be5a374).

[ 4.377316] igb: Intel(R) Gigabit Ethernet Network Driver - version 5.3.0-k
[ 4.377319] igb: Copyright (c) 2007-2014 Intel Corporation.
[ 4.377387] igb 0000:13:00.0: enabling device (0000 -> 0002)
[ 4.413040] igb 0000:13:00.0: added PHC on eth0
[ 4.413041] igb 0000:13:00.0: Intel(R) Gigabit Ethernet Network Connection
[ 4.413043] igb 0000:13:00.0: eth0: (PCIe:5.0Gb/s:unknown) xx:xx:xx:xx:xx:xx
[ 4.413125] igb 0000:13:00.0: eth0: PBA No: 001300-000
[ 4.413127] igb 0000:13:00.0: Using MSI-X interrupts. 1 rx queue(s), 1 tx queue(s)
[ 4.981302] igb 0000:13:00.0 ens224: renamed from eth0
[ 5.394728] igb 0000:13:00.0: changing MTU from 1500 to 9000
[ 7.904222] igb 0000:13:00.0 ens224: igb: ens224 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
[ 7.904479] IPv6: ADDRCONF(NETDEV_CHANGE): ens224: link becomes ready
[ 13.023592] ------------[ cut here ]------------
[ 13.023621] WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:272 dev_watchdog+0x22c/0x230
[ 13.023624] NETDEV WATCHDOG: ens224 (igb): transmit queue 0 timed out
[ 13.023625] Modules linked in: vmw_vsock_vmci_transport vsock joydev ppdev vmw_balloon intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel input_leds psmouse serio_raw igb ptp pps_core dca i2c_algo_bit nls_iso8859_1 ahci libahci fjes parport_pc parport vmw_vmci i2c_piix4 pata_acpi mac_hid sch_fq_codel ib_iser rdma_cm iw_cm ib_cm ib_core configfs iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi autofs4 raid10 raid456 libcrc32c async_raid6_recov async_memcpy async_pq async_xor xor async_tx raid6_pq raid1 raid0 multipath linear aesni_intel aes_x86_64 glue_helper lrw gf128mul ablk_helper cryptd vmw_pvscsi vmwgfx ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm shpchp
[ 13.023650] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.7.0-msi2 #1
[ 13.023652] Hardware name: VMware, Inc. VMware7,1/440BX Desktop Reference Platform, BIOS VMW71.00V.0.B64.1506250318 06/25/2015
[ 13.023653] 0000000000000286 dd7086eb3068bd0c ffff8880fc603d70 ffffffff833f8cd3
[ 13.023654] ffff8880fc603dc0 0000000000000000 ffff8880fc603db0 ffffffff830820bb
[ 13.023655] 00000110fc603da8 0000000000000000 0000000000000008 0000000000000000
[ 13.023656] Call Trace:
[ 13.023657] <IRQ> [<ffffffff833f8cd3>] dump_stack+0x63/0x90
[ 13.023664] [<ffffffff830820bb>] __warn+0xcb/0xf0
[ 13.023665] [<ffffffff8308213f>] warn_slowpath_fmt+0x5f/0x80
[ 13.023667] [<ffffffff8376b55c>] dev_watchdog+0x22c/0x230
[ 13.023668] [<ffffffff8376b330>] ? qdisc_rcu_free+0x40/0x40
[ 13.023671] [<ffffffff830f1595>] call_timer_fn+0x35/0x120
[ 13.023672] [<ffffffff830f1ef2>] run_timer_softirq+0x222/0x4d0
[ 13.023674] [<ffffffff830fa46c>] ? ktime_get+0x3c/0xb0
[ 13.023676] [<ffffffff83052dca>] ? lapic_next_deadline+0x2a/0x30
[ 13.023678] [<ffffffff83100e12>] ? clockevents_program_event+0x82/0x120
[ 13.023681] [<ffffffff83853b76>] __do_softirq+0xf6/0x280
[ 13.023683] [<ffffffff83087d63>] irq_exit+0xa3/0xb0
[ 13.023684] [<ffffffff83853992>] smp_apic_timer_interrupt+0x42/0x50
[ 13.023686] [<ffffffff83851c62>] apic_timer_interrupt+0x82/0x90
[ 13.023686] <EOI> [<ffffffff83064486>] ? native_safe_halt+0x6/0x10
[ 13.023691] [<ffffffff83037fde>] default_idle+0x1e/0xd0
[ 13.023692] [<ffffffff8303875f>] arch_cpu_idle+0xf/0x20
[ 13.023693] [<ffffffff830c613a>] default_idle_call+0x2a/0x40
[ 13.023694] [<ffffffff830c643a>] cpu_startup_entry+0x2ea/0x350
[ 13.023696] [<ffffffff838437d7>] rest_init+0x77/0x80
[ 13.023699] [<ffffffff83f61fe9>] start_kernel+0x466/0x487
[ 13.023700] [<ffffffff83f61120>] ? early_idt_handler_array+0x120/0x120
[ 13.023701] [<ffffffff83f612da>] x86_64_start_reservations+0x29/0x2b
[ 13.023702] [<ffffffff83f6141f>] x86_64_start_kernel+0x143/0x166
[ 13.023704] ---[ end trace 46f69508befb7107 ]---
[ 13.023724] igb 0000:13:00.0 ens224: Reset adapter
[ 17.824272] igb 0000:13:00.0 ens224: igb: ens224 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
Re: PROBLEM: Intel I210AT NIC resets while using PCI passthrough on ESXi (regression) [ In reply to ]
On Tue, 26 Jul 2016, Foster Snowhill wrote:
> On 26.07.16 14:46, Thomas Gleixner wrote:
> > Can you please try whether the replacement patch below fixes your issue as
> > well?

> This one doesn't fix the issue, getting resets again. Patch applied to HEAD
> (commit e65805251f2db69c9f67ed8062ab82526be5a374).
>
> [ 4.413127] igb 0000:13:00.0: Using MSI-X interrupts. 1 rx queue(s), 1 tx queue(s)

Gah! That's MSI-X not MSI. So yes, the patch can't work then. Though it
confirms that it is an actual VMWARE emulation bug ...

Thanks,

tglx
Re: PROBLEM: Intel I210AT NIC resets while using PCI passthrough on ESXi (regression) [ In reply to ]
On Mon, 25 Jul 2016, Foster Snowhill wrote:
> On 25.07.16 13:56, Thomas Gleixner wrote:
> > On Sat, 23 Jul 2016, Foster Snowhill wrote:
> >> [1.] One line summary of the problem:
> >>
> >> Intel I210AT NIC resets while using PCI passthrough on ESXi (regression)
> >
> > That has been reported before and so far this has been believed to be a vmware
> > issue. Native hardware and KVM do not expose that. See:
> >
> > http://marc.info/?l=linux-kernel&m=145280623530135&w=2
> > http://marc.info/?l=linux-kernel&m=145879968005421&w=2
>
> This is indeed the same symptoms I encountered in my case. More than that, I have an LSI 2008 adapter which is also being passed through in the same manner as the NIC, absolutely the same problem there too. Before doing the bisect I considered it a separate issue and planned to work on it later.
>
> > Could you please give the patch below a try? It might be related, but I'm not
> > sure whether it will cure that particular vmware oddity.
>
> Patch fixed the issue for me. Did 4-5 reboots just to be sure, all working properly, no adapter resets. Just for the sake of experiment built a clean 4.7 kernel without the patch, issue still present there as expected.
>

Can you please test something else with that patch?

After starting the NIC, can you please shutdown the interface, so the
interrupt gets shutdown as well. (It should vanish from /proc/interrupts).

Does that work correctly?

Thanks,

tglx
Re: PROBLEM: Intel I210AT NIC resets while using PCI passthrough on ESXi (regression) [ In reply to ]
On 26.07.16 16:49, Thomas Gleixner wrote:
> On Mon, 25 Jul 2016, Foster Snowhill wrote:
>> On 25.07.16 13:56, Thomas Gleixner wrote:
>>> Could you please give the patch below a try? It might be related, but I'm not
>>> sure whether it will cure that particular vmware oddity.
>>
>> Patch fixed the issue for me. Did 4-5 reboots just to be sure, all working properly, no adapter resets. Just for the sake of experiment built a clean 4.7 kernel without the patch, issue still present there as expected.
>>
>
> Can you please test something else with that patch?
>
> After starting the NIC, can you please shutdown the interface, so the
> interrupt gets shutdown as well. (It should vanish from /proc/interrupts).

root@ubuntutest ~ # cat /proc/interrupts
CPU0
0: 22 IO-APIC 2-edge timer
1: 9 IO-APIC 1-edge i8042
4: 3641 IO-APIC 4-edge serial
8: 1 IO-APIC 8-edge rtc0
9: 0 IO-APIC 9-fasteoi acpi
12: 15 IO-APIC 12-edge i8042
14: 0 IO-APIC 14-edge ata_piix
15: 0 IO-APIC 15-edge ata_piix
16: 1 IO-APIC 16-fasteoi vmwgfx
24: 0 PCI-MSI 344064-edge PCIe PME, pciehp
25: 0 PCI-MSI 346112-edge PCIe PME, pciehp
26: 0 PCI-MSI 348160-edge PCIe PME, pciehp
27: 0 PCI-MSI 350208-edge PCIe PME, pciehp
28: 0 PCI-MSI 352256-edge PCIe PME, pciehp
29: 0 PCI-MSI 354304-edge PCIe PME, pciehp
30: 0 PCI-MSI 356352-edge PCIe PME, pciehp
31: 0 PCI-MSI 358400-edge PCIe PME, pciehp
32: 0 PCI-MSI 360448-edge PCIe PME, pciehp
33: 0 PCI-MSI 362496-edge PCIe PME, pciehp
34: 0 PCI-MSI 364544-edge PCIe PME, pciehp
35: 0 PCI-MSI 366592-edge PCIe PME, pciehp
36: 0 PCI-MSI 368640-edge PCIe PME, pciehp
37: 0 PCI-MSI 370688-edge PCIe PME, pciehp
38: 0 PCI-MSI 372736-edge PCIe PME, pciehp
39: 0 PCI-MSI 374784-edge PCIe PME, pciehp
40: 0 PCI-MSI 376832-edge PCIe PME, pciehp
41: 0 PCI-MSI 378880-edge PCIe PME, pciehp
42: 0 PCI-MSI 380928-edge PCIe PME, pciehp
43: 0 PCI-MSI 382976-edge PCIe PME, pciehp
44: 0 PCI-MSI 385024-edge PCIe PME, pciehp
45: 0 PCI-MSI 387072-edge PCIe PME, pciehp
46: 0 PCI-MSI 389120-edge PCIe PME, pciehp
47: 0 PCI-MSI 391168-edge PCIe PME, pciehp
48: 0 PCI-MSI 393216-edge PCIe PME, pciehp
49: 0 PCI-MSI 395264-edge PCIe PME, pciehp
50: 0 PCI-MSI 397312-edge PCIe PME, pciehp
51: 0 PCI-MSI 399360-edge PCIe PME, pciehp
52: 0 PCI-MSI 401408-edge PCIe PME, pciehp
53: 0 PCI-MSI 403456-edge PCIe PME, pciehp
54: 0 PCI-MSI 405504-edge PCIe PME, pciehp
55: 0 PCI-MSI 407552-edge PCIe PME, pciehp
56: 4319 PCI-MSI 1572864-edge vmw_pvscsi
57: 32 PCI-MSI 129024-edge vmw_vmci
58: 0 PCI-MSI 129025-edge vmw_vmci
59: 134 PCI-MSI 1064960-edge ahci[0000:02:01.0]
60: 1 PCI-MSI 9961472-edge ens224
61: 64 PCI-MSI 9961473-edge ens224-rx-0
62: 43 PCI-MSI 9961474-edge ens224-tx-0
NMI: 0 Non-maskable interrupts
LOC: 5384 Local timer interrupts
SPU: 0 Spurious interrupts
PMI: 0 Performance monitoring interrupts
IWI: 1 IRQ work interrupts
RTR: 0 APIC ICR read retries
RES: 0 Rescheduling interrupts
CAL: 0 Function call interrupts
TLB: 0 TLB shootdowns
TRM: 0 Thermal event interrupts
THR: 0 Threshold APIC interrupts
DFR: 0 Deferred Error APIC interrupts
MCE: 0 Machine check exceptions
MCP: 1 Machine check polls
ERR: 0
MIS: 0
PIN: 0 Posted-interrupt notification event
PIW: 0 Posted-interrupt wakeup event
root@ubuntutest ~ # ifdown ens224
RTNETLINK answers: No such process
RTNETLINK answers: Cannot assign requested address
root@ubuntutest ~ # cat /proc/interrupts
CPU0
0: 22 IO-APIC 2-edge timer
1: 9 IO-APIC 1-edge i8042
4: 4459 IO-APIC 4-edge serial
8: 1 IO-APIC 8-edge rtc0
9: 0 IO-APIC 9-fasteoi acpi
12: 15 IO-APIC 12-edge i8042
14: 0 IO-APIC 14-edge ata_piix
15: 0 IO-APIC 15-edge ata_piix
16: 1 IO-APIC 16-fasteoi vmwgfx
24: 0 PCI-MSI 344064-edge PCIe PME, pciehp
25: 0 PCI-MSI 346112-edge PCIe PME, pciehp
26: 0 PCI-MSI 348160-edge PCIe PME, pciehp
27: 0 PCI-MSI 350208-edge PCIe PME, pciehp
28: 0 PCI-MSI 352256-edge PCIe PME, pciehp
29: 0 PCI-MSI 354304-edge PCIe PME, pciehp
30: 0 PCI-MSI 356352-edge PCIe PME, pciehp
31: 0 PCI-MSI 358400-edge PCIe PME, pciehp
32: 0 PCI-MSI 360448-edge PCIe PME, pciehp
33: 0 PCI-MSI 362496-edge PCIe PME, pciehp
34: 0 PCI-MSI 364544-edge PCIe PME, pciehp
35: 0 PCI-MSI 366592-edge PCIe PME, pciehp
36: 0 PCI-MSI 368640-edge PCIe PME, pciehp
37: 0 PCI-MSI 370688-edge PCIe PME, pciehp
38: 0 PCI-MSI 372736-edge PCIe PME, pciehp
39: 0 PCI-MSI 374784-edge PCIe PME, pciehp
40: 0 PCI-MSI 376832-edge PCIe PME, pciehp
41: 0 PCI-MSI 378880-edge PCIe PME, pciehp
42: 0 PCI-MSI 380928-edge PCIe PME, pciehp
43: 0 PCI-MSI 382976-edge PCIe PME, pciehp
44: 0 PCI-MSI 385024-edge PCIe PME, pciehp
45: 0 PCI-MSI 387072-edge PCIe PME, pciehp
46: 0 PCI-MSI 389120-edge PCIe PME, pciehp
47: 0 PCI-MSI 391168-edge PCIe PME, pciehp
48: 0 PCI-MSI 393216-edge PCIe PME, pciehp
49: 0 PCI-MSI 395264-edge PCIe PME, pciehp
50: 0 PCI-MSI 397312-edge PCIe PME, pciehp
51: 0 PCI-MSI 399360-edge PCIe PME, pciehp
52: 0 PCI-MSI 401408-edge PCIe PME, pciehp
53: 0 PCI-MSI 403456-edge PCIe PME, pciehp
54: 0 PCI-MSI 405504-edge PCIe PME, pciehp
55: 0 PCI-MSI 407552-edge PCIe PME, pciehp
56: 4338 PCI-MSI 1572864-edge vmw_pvscsi
57: 39 PCI-MSI 129024-edge vmw_vmci
58: 0 PCI-MSI 129025-edge vmw_vmci
59: 156 PCI-MSI 1064960-edge ahci[0000:02:01.0]
60: 1 PCI-MSI 9961472-edge
61: 72 PCI-MSI 9961473-edge
62: 49 PCI-MSI 9961474-edge
NMI: 0 Non-maskable interrupts
LOC: 7417 Local timer interrupts
SPU: 0 Spurious interrupts
PMI: 0 Performance monitoring interrupts
IWI: 1 IRQ work interrupts
RTR: 0 APIC ICR read retries
RES: 0 Rescheduling interrupts
CAL: 0 Function call interrupts
TLB: 0 TLB shootdowns
TRM: 0 Thermal event interrupts
THR: 0 Threshold APIC interrupts
DFR: 0 Deferred Error APIC interrupts
MCE: 0 Machine check exceptions
MCP: 1 Machine check polls
ERR: 0
MIS: 0
PIN: 0 Posted-interrupt notification event
PIW: 0 Posted-interrupt wakeup event
Re: PROBLEM: Intel I210AT NIC resets while using PCI passthrough on ESXi (regression) [ In reply to ]
On Tue, 26 Jul 2016, Foster Snowhill wrote:
> On 26.07.16 16:49, Thomas Gleixner wrote:
> > On Mon, 25 Jul 2016, Foster Snowhill wrote:
> >> On 25.07.16 13:56, Thomas Gleixner wrote:
> >>> Could you please give the patch below a try? It might be related, but I'm not
> >>> sure whether it will cure that particular vmware oddity.
> >>
> >> Patch fixed the issue for me. Did 4-5 reboots just to be sure, all working properly, no adapter resets. Just for the sake of experiment built a clean 4.7 kernel without the patch, issue still present there as expected.
> >>
> >
> > Can you please test something else with that patch?
> >
> > After starting the NIC, can you please shutdown the interface, so the
> > interrupt gets shutdown as well. (It should vanish from /proc/interrupts).

Does reenabling work or does that result in a stall again?

Thanks,

tglx
Re: PROBLEM: Intel I210AT NIC resets while using PCI passthrough on ESXi (regression) [ In reply to ]
On 26.07.16 17:06, Thomas Gleixner wrote:
> On Tue, 26 Jul 2016, Foster Snowhill wrote:
>> On 26.07.16 16:49, Thomas Gleixner wrote:
>>> On Mon, 25 Jul 2016, Foster Snowhill wrote:
>>>> On 25.07.16 13:56, Thomas Gleixner wrote:
>>>>> Could you please give the patch below a try? It might be related, but I'm not
>>>>> sure whether it will cure that particular vmware oddity.
>>>>
>>>> Patch fixed the issue for me. Did 4-5 reboots just to be sure, all working properly, no adapter resets. Just for the sake of experiment built a clean 4.7 kernel without the patch, issue still present there as expected.
>>>>
>>>
>>> Can you please test something else with that patch?
>>>
>>> After starting the NIC, can you please shutdown the interface, so the
>>> interrupt gets shutdown as well. (It should vanish from /proc/interrupts).
>
> Does reenabling work or does that result in a stall again?

Works.

root@ubuntutest ~ # ifup ens224
Waiting for DAD... Done
root@ubuntutest ~ # ping -s 1000 -n ya.ru
PING YA.ru (213.180.204.3) 1000(1028) bytes of data.
1008 bytes from 213.180.204.3: icmp_seq=1 ttl=57 time=2.97 ms
1008 bytes from 213.180.204.3: icmp_seq=2 ttl=57 time=2.85 ms
1008 bytes from 213.180.204.3: icmp_seq=3 ttl=57 time=2.87 ms
1008 bytes from 213.180.204.3: icmp_seq=4 ttl=57 time=2.92 ms
^C
--- YA.ru ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3004ms
rtt min/avg/max/mdev = 2.851/2.904/2.976/0.072 ms
root@ubuntutest ~ # grep ens224 /proc/interrupts
60: 3 PCI-MSI 9961472-edge ens224
61: 1686 PCI-MSI 9961473-edge ens224-rx-0
62: 890 PCI-MSI 9961474-edge ens224-tx-0