11.1 running on HyperV hn interface hangs

classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

11.1 running on HyperV hn interface hangs

Paul Koch-3

No sure if -stable is the right mailing list for this one.

We recently moved our software from 11.0-p9 to 11.1-p1, but looks like there
is a regression in 11.1-p1 running on HyperV (Windows/HyperV 2012 R2) where
the virtual hn0 interface hangs with the following kernel messages:

 hn0: <Hyper-V Network Interface> on vmbus0
 hn0: Ethernet address: 00:15:5d:31:21:0f
 hn0: link state changed to UP
 ...
 hn0: RXBUF ack retry
 hn0: RXBUF ack failed
 last message repeated 571 times

It requires a restart of the HyperV VM.

This is a customer production server (remote customer ~4000km away) running
fairly critical monitoring software, so we needed to roll it back to 11.0-p9.
We only have two customers running our software in HyperV, vs lots in VMware
and a handful on physical hardware.

11.0-p9 has been very stable.  Has anyone seen this problem before with 11.1 ?

11.1 is listed here
 https://docs.microsoft.com/en-us/windows-server/virtualization/hyper-v/supported-freebsd-virtual-machines-on-hyper-v

        Paul.
--
Paul Koch | Founder | CEO
AKIPS Network Monitor | akips.com
Brisbane, Australia
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: 11.1 running on HyperV hn interface hangs

Pete French-3
> We recently moved our software from 11.0-p9 to 11.1-p1, but looks like there
> is a regression in 11.1-p1 running on HyperV (Windows/HyperV 2012 R2) where
> the virtual hn0 interface hangs with the following kernel messages:
>
>   hn0: <Hyper-V Network Interface> on vmbus0
>   hn0: Ethernet address: 00:15:5d:31:21:0f
>   hn0: link state changed to UP
>   ...
>   hn0: RXBUF ack retry
>   hn0: RXBUF ack failed
>   last message repeated 571 times
>
> It requires a restart of the HyperV VM.
>
> This is a customer production server (remote customer ~4000km away) running
> fairly critical monitoring software, so we needed to roll it back to 11.0-p9.
> We only have two customers running our software in HyperV, vs lots in VMware
> and a handful on physical hardware.
>
> 11.0-p9 has been very stable.  Has anyone seen this problem before with 11.1 ?


I don't run anything on local hyper-v anymore, but I do run a ot of
stuff in Azure, and we havent seen anything like this. I track STABLE
for things though, updating after reading the commits and testing
locally for a week or so, so the version I am running currently is
r320175, which was part of 11.1-BETA2. I am going to upgrade to a more
recent STABLE sometime this weke or next though, will do that on a test
amchine and let you now how it goes.

I seem to recall that there were some large changes to the hn code in
August to add virtual function support. When does 11.1-p1 date from ?

-pete.
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: 11.1 running on HyperV hn interface hangs

Paul Koch-2
On Wed, 6 Sep 2017 12:02:43 +0100
Pete French <[hidden email]> wrote:

> > We recently moved our software from 11.0-p9 to 11.1-p1, but looks like
> > there is a regression in 11.1-p1 running on HyperV (Windows/HyperV 2012
> > R2) where the virtual hn0 interface hangs with the following kernel
> > messages:
> >
> >   hn0: <Hyper-V Network Interface> on vmbus0
> >   hn0: Ethernet address: 00:15:5d:31:21:0f
> >   hn0: link state changed to UP
> >   ...
> >   hn0: RXBUF ack retry
> >   hn0: RXBUF ack failed
> >   last message repeated 571 times
> >
> > It requires a restart of the HyperV VM.
> >
> > This is a customer production server (remote customer ~4000km away)
> > running fairly critical monitoring software, so we needed to roll it back
> > to 11.0-p9. We only have two customers running our software in HyperV, vs
> > lots in VMware and a handful on physical hardware.
> >
> > 11.0-p9 has been very stable.  Has anyone seen this problem before with
> > 11.1 ?  
>
>
> I don't run anything on local hyper-v anymore, but I do run a ot of
> stuff in Azure, and we havent seen anything like this. I track STABLE
> for things though, updating after reading the commits and testing
> locally for a week or so, so the version I am running currently is
> r320175, which was part of 11.1-BETA2. I am going to upgrade to a more
> recent STABLE sometime this weke or next though, will do that on a test
> amchine and let you now how it goes.
>
> I seem to recall that there were some large changes to the hn code in
> August to add virtual function support. When does 11.1-p1 date from ?

Looks like 2017-08-10

        Paul.
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: 11.1 running on HyperV hn interface hangs

Julian Elischer-5
In reply to this post by Pete French-3
On 6/9/17 7:02 pm, Pete French wrote:

>> We recently moved our software from 11.0-p9 to 11.1-p1, but looks
>> like there
>> is a regression in 11.1-p1 running on HyperV (Windows/HyperV 2012
>> R2) where
>> the virtual hn0 interface hangs with the following kernel messages:
>>
>>   hn0: <Hyper-V Network Interface> on vmbus0
>>   hn0: Ethernet address: 00:15:5d:31:21:0f
>>   hn0: link state changed to UP
>>   ...
>>   hn0: RXBUF ack retry
>>   hn0: RXBUF ack failed
>>   last message repeated 571 times
>>
>> It requires a restart of the HyperV VM.
>>
>> This is a customer production server (remote customer ~4000km away)
>> running
>> fairly critical monitoring software, so we needed to roll it back
>> to 11.0-p9.
>> We only have two customers running our software in HyperV, vs lots
>> in VMware
>> and a handful on physical hardware.
>>
>> 11.0-p9 has been very stable.  Has anyone seen this problem before
>> with 11.1 ?
>
>
> I don't run anything on local hyper-v anymore, but I do run a ot of
> stuff in Azure, and we havent seen anything like this. I track
> STABLE for things though, updating after reading the commits and
> testing locally for a week or so, so the version I am running
> currently is r320175, which was part of 11.1-BETA2. I am going to
> upgrade to a more recent STABLE sometime this weke or next though,
> will do that on a test amchine and let you now how it goes.
>
> I seem to recall that there were some large changes to the hn code
> in August to add virtual function support. When does 11.1-p1 date
> from ?
make sure you contact the FreeBSD/Microsoft guys.  Very responsive..
don't know if they watch -stable..
I'll cc a couple..

>
> -pete.
> _______________________________________________
> [hidden email] mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to
> "[hidden email]"
>

_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: 11.1 running on HyperV hn interface hangs

Mark Millard-2
In reply to this post by Paul Koch-3
Paul Koch paul.koch at akips.com wrote on
Wed Sep 6 09:33:26 UTC 2017 :

> We recently moved our software from 11.0-p9 to 11.1-p1, but looks like there
> is a regression in 11.1-p1 running on HyperV (Windows/HyperV 2012 R2) where
> the virtual hn0 interface hangs with the following kernel messages:
>
>  hn0: <Hyper-V Network Interface> on vmbus0
>  hn0: Ethernet address: 00:15:5d:31:21:0f
>  hn0: link state changed to UP
>  ...
>  hn0: RXBUF ack retry
>  hn0: RXBUF ack failed
>  last message repeated 571 times
>
> . . .
>
> Has anyone seen this problem before with 11.1 ?

While it is/was a personal use/experiment I have
used all the following under Windows 10 Pro's
Hyper-V with networking via hn0 Ethernet as seen
from the guest FreeBSD:

releng/11.1 (no longer around to remind me of the
             most recent -r?????? but various updates )
stable/11   (various updates, -r320807 currently)
head        (various updates, -r323147 currently)

I had no problems with my use. (By no means a traffic
match to your context but definitely used.)

In all cases the Virtual Switch Manager was tied to the
(builtin) "External network" that is listed as:

Intel(R) I211 Gigabit Network Connection

in the Virtual Switch Properties pop-up for
External network. The machine is not a server.

So not totally broken as far as I can tell. Something
more specific to your context would seem to also be
involved.

Hyper-V has worked nicely for assigning 14 of the machine's
16 hardware threads to FreeBSD and doing buildworld buildkernel
and poudriere based port builds. (Windows 10 Pro not being
otherwise busy.)

===
Mark Millard
markmi at dsl-only.net

_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: 11.1 running on HyperV hn interface hangs

Sepherosa Ziehau-2
In reply to this post by Julian Elischer-5
Is it possible to tell me your workload?  e.g. TX heavy or RX heavy.
Enabled TSO or not.  Details like how the send syscalls are issue will
be interesting.  And your Windows version, include the patch level,
etc.

Please try the following patch:
https://people.freebsd.org/~sephe/hn_dec_txdesc.diff

Thanks,
sephe


On Wed, Sep 6, 2017 at 11:23 PM, Julian Elischer <[hidden email]> wrote:

> On 6/9/17 7:02 pm, Pete French wrote:
>>>
>>> We recently moved our software from 11.0-p9 to 11.1-p1, but looks like
>>> there
>>> is a regression in 11.1-p1 running on HyperV (Windows/HyperV 2012 R2)
>>> where
>>> the virtual hn0 interface hangs with the following kernel messages:
>>>
>>>   hn0: <Hyper-V Network Interface> on vmbus0
>>>   hn0: Ethernet address: 00:15:5d:31:21:0f
>>>   hn0: link state changed to UP
>>>   ...
>>>   hn0: RXBUF ack retry
>>>   hn0: RXBUF ack failed
>>>   last message repeated 571 times
>>>
>>> It requires a restart of the HyperV VM.
>>>
>>> This is a customer production server (remote customer ~4000km away)
>>> running
>>> fairly critical monitoring software, so we needed to roll it back to
>>> 11.0-p9.
>>> We only have two customers running our software in HyperV, vs lots in
>>> VMware
>>> and a handful on physical hardware.
>>>
>>> 11.0-p9 has been very stable.  Has anyone seen this problem before with
>>> 11.1 ?
>>
>>
>>
>> I don't run anything on local hyper-v anymore, but I do run a ot of stuff
>> in Azure, and we havent seen anything like this. I track STABLE for things
>> though, updating after reading the commits and testing locally for a week or
>> so, so the version I am running currently is r320175, which was part of
>> 11.1-BETA2. I am going to upgrade to a more recent STABLE sometime this weke
>> or next though, will do that on a test amchine and let you now how it goes.
>>
>> I seem to recall that there were some large changes to the hn code in
>> August to add virtual function support. When does 11.1-p1 date from ?
>
> make sure you contact the FreeBSD/Microsoft guys.  Very responsive.. don't
> know if they watch -stable..
> I'll cc a couple..
>
>>
>> -pete.
>> _______________________________________________
>> [hidden email] mailing list
>> https://lists.freebsd.org/mailman/listinfo/freebsd-stable
>> To unsubscribe, send any mail to "[hidden email]"
>>
>



--
Tomorrow Will Never Die
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: 11.1 running on HyperV hn interface hangs

Sepherosa Ziehau-2
Ignore ths hn_dec_txdesc.diff, please try this done; should be more effective:
https://people.freebsd.org/~sephe/hn_inc_txbr.diff

On Thu, Sep 7, 2017 at 10:22 AM, Sepherosa Ziehau <[hidden email]> wrote:

> Is it possible to tell me your workload?  e.g. TX heavy or RX heavy.
> Enabled TSO or not.  Details like how the send syscalls are issue will
> be interesting.  And your Windows version, include the patch level,
> etc.
>
> Please try the following patch:
> https://people.freebsd.org/~sephe/hn_dec_txdesc.diff
>
> Thanks,
> sephe
>
>
> On Wed, Sep 6, 2017 at 11:23 PM, Julian Elischer <[hidden email]> wrote:
>> On 6/9/17 7:02 pm, Pete French wrote:
>>>>
>>>> We recently moved our software from 11.0-p9 to 11.1-p1, but looks like
>>>> there
>>>> is a regression in 11.1-p1 running on HyperV (Windows/HyperV 2012 R2)
>>>> where
>>>> the virtual hn0 interface hangs with the following kernel messages:
>>>>
>>>>   hn0: <Hyper-V Network Interface> on vmbus0
>>>>   hn0: Ethernet address: 00:15:5d:31:21:0f
>>>>   hn0: link state changed to UP
>>>>   ...
>>>>   hn0: RXBUF ack retry
>>>>   hn0: RXBUF ack failed
>>>>   last message repeated 571 times
>>>>
>>>> It requires a restart of the HyperV VM.
>>>>
>>>> This is a customer production server (remote customer ~4000km away)
>>>> running
>>>> fairly critical monitoring software, so we needed to roll it back to
>>>> 11.0-p9.
>>>> We only have two customers running our software in HyperV, vs lots in
>>>> VMware
>>>> and a handful on physical hardware.
>>>>
>>>> 11.0-p9 has been very stable.  Has anyone seen this problem before with
>>>> 11.1 ?
>>>
>>>
>>>
>>> I don't run anything on local hyper-v anymore, but I do run a ot of stuff
>>> in Azure, and we havent seen anything like this. I track STABLE for things
>>> though, updating after reading the commits and testing locally for a week or
>>> so, so the version I am running currently is r320175, which was part of
>>> 11.1-BETA2. I am going to upgrade to a more recent STABLE sometime this weke
>>> or next though, will do that on a test amchine and let you now how it goes.
>>>
>>> I seem to recall that there were some large changes to the hn code in
>>> August to add virtual function support. When does 11.1-p1 date from ?
>>
>> make sure you contact the FreeBSD/Microsoft guys.  Very responsive.. don't
>> know if they watch -stable..
>> I'll cc a couple..
>>
>>>
>>> -pete.
>>> _______________________________________________
>>> [hidden email] mailing list
>>> https://lists.freebsd.org/mailman/listinfo/freebsd-stable
>>> To unsubscribe, send any mail to "[hidden email]"
>>>
>>
>
>
>
> --
> Tomorrow Will Never Die



--
Tomorrow Will Never Die
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: 11.1 running on HyperV hn interface hangs

Paul Koch-3
In reply to this post by Sepherosa Ziehau-2
On Thu, 7 Sep 2017 10:22:40 +0800
Sepherosa Ziehau <[hidden email]> wrote:

> Is it possible to tell me your workload?  e.g. TX heavy or RX heavy.
> Enabled TSO or not.  Details like how the send syscalls are issue will
> be interesting.  And your Windows version, include the patch level,
> etc.
>
> Please try the following patch:
> https://people.freebsd.org/~sephe/hn_dec_txdesc.diff
>
> Thanks,
> sephe

Hi Sephe,

Here's a bit of an explanation of the environment...

AKIPS Network Monitor workload:
- 22000 devices (routers/switches/APs/etc)
- 123000 interfaces (60 snmp polling)
- 131 netflow exporters
- ~1500 pings per second
- ~1000 snmp requests/responses per second (~1.9 million MIB object/min)
- ~250 netflow packets/sec (~4500 flows/sec incoming)
- ~130 syslog messages/sec (incoming)
- ~200 snmp traps/sec (incoming)

The ping/snmp poller is a single monolithic process (no threads).
Separate processes for each of the syslog/trap/netflow collection.

SNMP requests are sent using the sendto() system call over a non-blocking UDP
socket for both IPv4 and v6.  We set the UDP socket receive buffer size to
4 Mbytes.  Nothing really complex with it.

Pings are interlaced with snmp requests so we limit the bursty nature of
small back-to-back packets (eliminates issues with switch interfaces dropping
bursts of packets).  Ping requests are sent using a raw icmp socket.  We
don't read the responses from the icmp socket, instead we put the interface
into promiscuous mode and use the BPF info to measure the tx/rx RTT values.

Syslog daemon just listens on a UDP socket with a 4 Mbyte receive buffer.
Same with the snmp trap daemon.


Here's some links to performance graphs of the VM:
 https://www.akips.com/downloads/hyperv-fbsd11.1p1/system-graphs-last2h.pdf
 https://www.akips.com/downloads/hyperv-fbsd11.1p1/system-graphs-last24h.pdf
 https://www.akips.com/downloads/hyperv-fbsd11.1p1/system-graphs-last7d.pdf

The OS was upgraded to 11.1p1 at 5pm on the 5th Sep.  The hn0 interface hung
at 7:36pm.  The interface hung three times before we reverted to 11.0p9.  It
takes a few hours after rebooting the VM before the interface hangs.


Microsoft Host is running Windows 2012 R2.  Waiting for patch level info from
the customer.

I'll have to get the customer to spin up a new VM before trying your patch.


Here's some info (after a reboot of the VM)

Guest VM dmesg:

FreeBSD 11.1-RELEASE-p1 #0 r322350: Thu Aug 10 22:16:21 UTC 2017
    [hidden email]:/usr/obj/usr/src/sys/GENERIC amd64
FreeBSD clang version 4.0.0 (tags/RELEASE_400/final 297347) (based on LLVM
4.0.0)
VT(vga): text 80x25
Hyper-V Version: 6.3.9600 [SP18]
  Features=0xe7f<VPRUNTIME,TMREFCNT,SYNIC,SYNTM,APIC,HYPERCALL,VPINDEX,REFTSC,IDLE,TMFREQ>
  PM Features=0x0 [C2]
  Features3=0x7b2<DEBUG,XMMHC,IDLE,NUMA,TMFREQ,SYNCMC,CRASH>
Timecounter "Hyper-V" frequency 10000000 Hz quality 2000
CPU: Intel(R) Xeon(R) CPU E5-2695 v3 @ 2.30GHz (2300.00-MHz K8-class CPU)
  Origin="GenuineIntel"  Id=0x306f2  Family=0x6  Model=0x3f  Stepping=2
  Features=0x1f83fbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,MMX,FXSR,SSE,SSE2,SS,HTT>
  Features2=0x80002001<SSE3,CX16,HV>
  AMD Features=0x20100800<SYSCALL,NX,LM>
  AMD Features2=0x1<LAHF>
Hypervisor: Origin = "Microsoft Hv"
real memory  = 34359738368 (32768 MB)
avail memory = 33325903872 (31782 MB)
Event timer "LAPIC" quality 100
ACPI APIC Table: <VRTUAL MICROSFT>
FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs
FreeBSD/SMP: 1 package(s) x 4 core(s)
random: unblocking device.
ioapic0: Changing APIC ID to 0
ioapic0 <Version 1.1> irqs 0-23 on motherboard
SMP: AP CPU #1 Launched!
SMP: AP CPU #3 Launched!
SMP: AP CPU #2 Launched!
Timecounter "Hyper-V-TSC" frequency 10000000 Hz quality 3000
random: entropy device external interface
kbd1 at kbdmux0
netmap: loaded module
module_register_init: MOD_LOAD (vesa, 0xffffffff80f5b220, 0) error 19
nexus0
vtvga0: <VT VGA driver> on motherboard
cryptosoft0: <software crypto> on motherboard
acpi0: <VRTUAL MICROSFT> on motherboard
acpi0: Power Button (fixed)
cpu0: <ACPI CPU> on acpi0
cpu1: <ACPI CPU> on acpi0
cpu2: <ACPI CPU> on acpi0
cpu3: <ACPI CPU> on acpi0
attimer0: <AT timer> port 0x40-0x43 irq 0 on acpi0
Timecounter "i8254" frequency 1193182 Hz quality 0
Event timer "i8254" frequency 1193182 Hz quality 100
atrtc0: <AT realtime clock> port 0x70-0x71 irq 8 on acpi0
Event timer "RTC" frequency 32768 Hz quality 0
Timecounter "ACPI-fast" frequency 3579545 Hz quality 900
acpi_timer0: <32-bit timer at 3.579545MHz> port 0x408-0x40b on acpi0
pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0
vmbus0: <Hyper-V Vmbus> on pcib0
pci0: <ACPI PCI bus> on pcib0
isab0: <PCI-ISA bridge> at device 7.0 on pci0
isa0: <ISA bus> on isab0
atapci0: <Intel PIIX4 UDMA33 controller> port
0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xffa0-0xffaf at device 7.1 on pci0
ata0: <ATA channel> at channel 0 on atapci0
ata1: <ATA channel> at channel 1 on atapci0
pci0: <bridge> at device 7.3 (no driver attached)
vgapci0: <VGA-compatible display> mem 0xf8000000-0xfbffffff irq 11 at device
8.0 on pci0
vgapci0: Boot video device
atkbdc0: <Keyboard controller (i8042)> port 0x60,0x64 irq 1 on acpi0
atkbd0: <AT Keyboard> irq 1 on atkbdc0
kbd0 at atkbd0
atkbd0: [GIANT-LOCKED]
psm0: <PS/2 Mouse> irq 12 on atkbdc0
psm0: [GIANT-LOCKED]
psm0: model IntelliMouse Explorer, device ID 4
uart0: <16550 or compatible> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0
uart1: <16550 or compatible> port 0x2f8-0x2ff irq 3 on acpi0
fdc0: <floppy drive controller (FDE)> port 0x3f0-0x3f5,0x3f7 irq 6 drq 2 on
acpi0
fd0: <1440-KB 3.5" drive> on fdc0 drive 0
vmbus_res0: <Hyper-V Vmbus Resource> irq 5,7 on acpi0
orm0: <ISA Option ROM> at iomem 0xc0000-0xcbfff on isa0
vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0
ppc0: cannot reserve I/O port range
ZFS filesystem version: 5
ZFS storage pool version: features support (5000)
Timecounters tick every 1.000 msec
usb_needs_explore_all: no devclass
nvme cam probe device init
vmbus0: version 3.0
hvet0: <Hyper-V event timer> on vmbus0
Event timer "Hyper-V" frequency 10000000 Hz quality 1000
storvsc0: <Hyper-V IDE> on vmbus0
hvkbd0: <Hyper-V KBD> on vmbus0
hvheartbeat0: <Hyper-V Heartbeat> on vmbus0
hvkvp0: <Hyper-V KVP> on vmbus0
hvshutdown0: <Hyper-V Shutdown> on vmbus0
hvvss0: <Hyper-V VSS> on vmbus0
storvsc1: <Hyper-V SCSI> on vmbus0
hn0: <Hyper-V Network Interface> on vmbus0
da0 at blkvsc0 bus 0 scbus2 target 1 lun 0
da0: <Msft Virtual Disk 1.0> Fixed Direct Access SPC-3 SCSI device
da0: 300.000MB/s transfers
da0: Command Queueing enabled
da0: 307200MB (629145600 512 byte sectors)
cd0 at ata1 bus 0 scbus1 target 0 lun 0
cd0: <Msft Virtual CD/ROM 1.0> Removable CD-ROM SPC-3 SCSI device
cd0: 16.700MB/s transfers (WDMA2, ATAPI 12bytes, PIO 65534bytes)
cd0: Attempt to query device size failed: NOT READY, Medium not present
hn0: Ethernet address: 00:15:5d:31:21:0f
hn0: link state changed to UP
Trying to mount root from ufs:/dev/gpt/akips-root1 [rw]...
Setting hostuuid: 2b0f0733-401e-7646-9724-a89072a2f005.
Setting hostid: 0xd87511b7.
Starting file system checks:
/dev/gpt/akips-root1: FILE SYSTEM CLEAN; SKIPPING CHECKS
/dev/gpt/akips-root1: clean, 961795 free (227 frags, 120196 blocks, 0.0%
fragmentation)
/dev/gpt/akips-root0: FILE SYSTEM CLEAN; SKIPPING CHECKS
/dev/gpt/akips-root0: clean, 977258 free (146 frags, 122139 blocks, 0.0%
fragmentation)
Mounting local filesystems:.
ELF ldconfig
path: /lib /usr/lib /usr/lib/compat /usr/local/lib /usr/local/lib/perl5/5.24/mach/CORE
32-bit compatibility ldconfig path: /usr/lib32
Setting hostname: xxxxxx
Setting up harvesting:
[UMA],[FS_ATIME],SWI,INTERRUPT,NET_NG,NET_ETHER,NET_TUN,MOUSE,KEYBOARD,ATTACH,CACHED
Feeding entropy: .
Limiting icmp unreach response from 358 to 200 packets/sec
Limiting icmp unreach response from 464 to 200 packets/sec
Starting Network: lo0 hn0.
lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384
        options=600003<RXCSUM,TXCSUM,RXCSUM_IPV6,TXCSUM_IPV6>
        inet6 ::1 prefixlen 128
        inet6 fe80::1%lo0 prefixlen 64 scopeid 0x1
        inet 127.0.0.1 netmask 0xff000000
        nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
        groups: lo
hn0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=51b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,TSO4,LRO>
        ether 00:15:5d:31:21:0f
        hwaddr 00:15:5d:31:21:0f
        inet 172.31.5.13 netmask 0xffffff00 broadcast 172.31.5.255
        nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
        media: Ethernet autoselect (10Gbase-T <full-duplex>)
        status: active
Starting devd.
add host 127.0.0.1: gateway lo0 fib 0: route already in table
add net default: gateway 172.31.5.254
add host ::1: gateway lo0 fib 0: route already in table
add net fe80::: gateway ::1
add net ff02::: gateway ::1
add net ::ffff:0.0.0.0: gateway ::1
add net ::0.0.0.0: gateway ::1
Creating and/or trimming log files.
Starting syslogd.
Setting date via ntp.
Limiting icmp unreach response from 398 to 200 packets/sec
Limiting icmp unreach response from 350 to 200 packets/sec
Limiting icmp unreach response from 333 to 200 packets/sec
Limiting icmp unreach response from 501 to 200 packets/sec
Limiting icmp unreach response from 336 to 200 packets/sec
Limiting icmp unreach response from 351 to 200 packets/sec
No core dumps found.
Clearing /tmp (X related).
Updating motd:.
Mounting late filesystems:.
Starting ntpd.
Limiting icmp unreach response from 424 to 200 packets/sec
postfix/postfix-script: starting the Postfix mail system
vfs.zfs.zfetch.max_distance: 8388608 -> 67108864
vfs.usermount: 0 -> 1
net.inet.tcp.sendbuf_inc: 8192 -> 16384
net.inet.tcp.recvbuf_inc: 16384 -> 524288
net.inet.tcp.keepidle: 7200000 -> 60000
net.inet.tcp.keepintvl: 75000 -> 30000
net.inet.tcp.keepinit: 75000 -> 30000
net.inet.udp.blackhole: 0 -> 1
net.inet.tcp.blackhole: 0 -> 1
net.inet.sctp.blackhole: 0 -> 1
net.link.ether.inet.log_arp_movements: 1 -> 0
kern.init_shutdown_timeout: 120 -> 300
root: Clearing temporary files
root: Clearing core files
root: Clearing state machines
root: Prefetching database files
root: Starting time daemon
2017-09-06 09:03:46 nm-timed common.c:6032 DEBUG: Create fmap tmap 1032 bytes
root: Starting joat daemon
root: Starting watchdog
2017-09-06 09:03:46 nm-joatd common.c:5802 INFO: Start
Sep  6 09:03:46 bull nm-joatd: Start
Configuring vt: blanktime.
Performing sanity check on sshd configuration.
Starting sshd.
Starting cron.
--------------------------------------------------------------------


ifconfig:

lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384
        options=600003<RXCSUM,TXCSUM,RXCSUM_IPV6,TXCSUM_IPV6>
        inet6 ::1 prefixlen 128
        inet6 fe80::1%lo0 prefixlen 64 scopeid 0x1
        inet 127.0.0.1 netmask 0xff000000
        nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
        groups: lo
hn0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=51b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,TSO4,LRO>
        ether 00:15:5d:31:21:0f
        hwaddr 00:15:5d:31:21:0f
        inet 172.31.5.13 netmask 0xffffff00 broadcast 172.31.5.255
        nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
        media: Ethernet autoselect (10Gbase-T <full-duplex>)
        status: active
------------------------------------------------------------------------


Some sysctl info for the hn device

hw.hn.tx_agg_pkts: -1
hw.hn.tx_agg_size: -1
hw.hn.lro_mbufq_depth: 512
hw.hn.tx_swq_depth: 0
hw.hn.tx_ring_cnt: 0
hw.hn.chan_cnt: 0
hw.hn.use_if_start: 0
hw.hn.use_txdesc_bufring: 1
hw.hn.tx_taskq_mode: 0
hw.hn.tx_taskq_cnt: 1
hw.hn.lro_entry_count: 128
hw.hn.direct_tx_size: 128
hw.hn.tx_chimney_size: 0
hw.hn.tso_maxlen: 65535
hw.hn.trust_hostip: 1
hw.hn.trust_hostudp: 1
hw.hn.trust_hosttcp: 1
dev.hn.0.vf:
dev.hn.0.polling: 0
dev.hn.0.agg_pkts: -1
dev.hn.0.agg_size: -1
dev.hn.0.rndis_agg_align: 8
dev.hn.0.rndis_agg_pkts: 8
dev.hn.0.rndis_agg_size: 4026531839
dev.hn.0.rss_ind_size: 128
dev.hn.0.rss_hash: 1701<TOEPLITZ,IP4,TCP4,IP6,TCP6>
dev.hn.0.rxfilter: d<DIRECT,ALLMULTI,BROADCAST>
dev.hn.0.hwassist: 17<CSUM_IP,CSUM_IP_UDP,CSUM_IP_TCP,CSUM_IP_TSO>
dev.hn.0.caps:
3ff<VLAN,MTU,IPCS,TCP4CS,TCP6CS,UDP4CS,UDP6CS,TSO4,TSO6,HASHVAL>
dev.hn.0.ndis_version: 6.30 dev.hn.0.nvs_version: 327680
dev.hn.0.channel.13.sub.3.br.tx.state: rindex:122136 windex:122136 imask:0
ravail:0 wavail:520192 dev.hn.0.channel.13.sub.3.br.rx.state: rindex:157368
windex:157368 imask:0 ravail:0 wavail:520192 dev.hn.0.channel.13.sub.3.mnf: 1
dev.hn.0.channel.13.sub.3.cpu: 3
dev.hn.0.channel.13.sub.3.chanid: 16
dev.hn.0.channel.13.sub.2.br.tx.state: rindex:116032 windex:116032 imask:0
ravail:0 wavail:520192 dev.hn.0.channel.13.sub.2.br.rx.state: rindex:150096
windex:150096 imask:0 ravail:0 wavail:520192 dev.hn.0.channel.13.sub.2.mnf: 1
dev.hn.0.channel.13.sub.2.cpu: 2
dev.hn.0.channel.13.sub.2.chanid: 15
dev.hn.0.channel.13.sub.1.br.tx.state: rindex:236264 windex:236264 imask:0
ravail:0 wavail:520192 dev.hn.0.channel.13.sub.1.br.rx.state: rindex:304448
windex:304448 imask:0 ravail:0 wavail:520192 dev.hn.0.channel.13.sub.1.mnf: 1
dev.hn.0.channel.13.sub.1.cpu: 1
dev.hn.0.channel.13.sub.1.chanid: 14
dev.hn.0.channel.13.br.tx.state: rindex:147520 windex:147520 imask:0 ravail:0
wavail:520192 dev.hn.0.channel.13.br.rx.state: rindex:167776 windex:167776
imask:0 ravail:0 wavail:520192 dev.hn.0.channel.13.mnf: 1
dev.hn.0.channel.13.cpu: 0
dev.hn.0.rx_ring_inuse: 4
dev.hn.0.rx_ring_cnt: 4
dev.hn.0.rx_ack_failed: 0
dev.hn.0.small_pkts: 538
dev.hn.0.csum_trusted: 0
dev.hn.0.csum_udp: 8734
dev.hn.0.csum_tcp: 18
dev.hn.0.csum_ip: 9758
dev.hn.0.trust_hostip: 1
dev.hn.0.trust_hostudp: 1
dev.hn.0.trust_hosttcp: 1
dev.hn.0.lro_ackcnt_lim: 2
dev.hn.0.lro_length_lim: 18000
dev.hn.0.lro_tried: 18
dev.hn.0.lro_flushed: 18
dev.hn.0.lro_queued: 18
dev.hn.0.rx.3.pktbuf_len: 16384
dev.hn.0.rx.3.rss_pkts: 2223
dev.hn.0.rx.3.packets: 2223
dev.hn.0.rx.2.pktbuf_len: 16384
dev.hn.0.rx.2.rss_pkts: 2186
dev.hn.0.rx.2.packets: 2186
dev.hn.0.rx.1.pktbuf_len: 16384
dev.hn.0.rx.1.rss_pkts: 4323
dev.hn.0.rx.1.packets: 4323
dev.hn.0.rx.0.pktbuf_len: 16384
dev.hn.0.rx.0.rss_pkts: 1026
dev.hn.0.rx.0.packets: 1275
dev.hn.0.agg_align: 8
dev.hn.0.agg_pktmax: 8
dev.hn.0.agg_szmax: 6144
dev.hn.0.tx_ring_inuse: 4
dev.hn.0.tx_ring_cnt: 4
dev.hn.0.sched_tx: 1
dev.hn.0.direct_tx_size: 128
dev.hn.0.tx_chimney_size: 6144
dev.hn.0.tx_chimney_max: 6144
dev.hn.0.txdesc_cnt: 512
dev.hn.0.tx_chimney_tried: 1345
dev.hn.0.tx_chimney: 1345
dev.hn.0.tx_collapsed: 0
dev.hn.0.agg_flush_failed: 0
dev.hn.0.txdma_failed: 0
dev.hn.0.send_failed: 0
dev.hn.0.no_txdescs: 0
dev.hn.0.tx.3.sends: 0
dev.hn.0.tx.3.packets: 0
dev.hn.0.tx.3.oactive: 0
dev.hn.0.tx.2.sends: 0
dev.hn.0.tx.2.packets: 0
dev.hn.0.tx.2.oactive: 0
dev.hn.0.tx.1.sends: 19
dev.hn.0.tx.1.packets: 19
dev.hn.0.tx.1.oactive: 0
dev.hn.0.tx.0.sends: 1326
dev.hn.0.tx.0.packets: 1425
dev.hn.0.tx.0.oactive: 0
dev.hn.0.%parent: vmbus0
dev.hn.0.%pnpinfo: classid=f8615163-df3e-46c5-913f-f2d2f965ed0e
deviceid=503f1ad8-7dad-4d5f-9f16-6c4383e28d12 dev.hn.0.%location:
dev.hn.0.%driver: hn
dev.hn.0.%desc: Hyper-V Network Interface
dev.hn.%parent:
------------------------------------------------------------

_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: 11.1 running on HyperV hn interface hangs

Sepherosa Ziehau-2
Weird, your traffic pattern does not even belong to anything heavy.
Sending is mainly UDP, which will never be able to saturate the TX
buffer ring causing the RXBUF ACK sending failure.  This is weird.
Anyhow, make sure to test this patch:
8762017-Sep-07 02:19 hn_inc_txbr.diff

On Thu, Sep 7, 2017 at 1:07 PM, Paul Koch <[hidden email]> wrote:

> On Thu, 7 Sep 2017 10:22:40 +0800
> Sepherosa Ziehau <[hidden email]> wrote:
>
>> Is it possible to tell me your workload?  e.g. TX heavy or RX heavy.
>> Enabled TSO or not.  Details like how the send syscalls are issue will
>> be interesting.  And your Windows version, include the patch level,
>> etc.
>>
>> Please try the following patch:
>> https://people.freebsd.org/~sephe/hn_dec_txdesc.diff
>>
>> Thanks,
>> sephe
>
> Hi Sephe,
>
> Here's a bit of an explanation of the environment...
>
> AKIPS Network Monitor workload:
> - 22000 devices (routers/switches/APs/etc)
> - 123000 interfaces (60 snmp polling)
> - 131 netflow exporters
> - ~1500 pings per second
> - ~1000 snmp requests/responses per second (~1.9 million MIB object/min)
> - ~250 netflow packets/sec (~4500 flows/sec incoming)
> - ~130 syslog messages/sec (incoming)
> - ~200 snmp traps/sec (incoming)
>
> The ping/snmp poller is a single monolithic process (no threads).
> Separate processes for each of the syslog/trap/netflow collection.
>
> SNMP requests are sent using the sendto() system call over a non-blocking UDP
> socket for both IPv4 and v6.  We set the UDP socket receive buffer size to
> 4 Mbytes.  Nothing really complex with it.
>
> Pings are interlaced with snmp requests so we limit the bursty nature of
> small back-to-back packets (eliminates issues with switch interfaces dropping
> bursts of packets).  Ping requests are sent using a raw icmp socket.  We
> don't read the responses from the icmp socket, instead we put the interface
> into promiscuous mode and use the BPF info to measure the tx/rx RTT values.
>
> Syslog daemon just listens on a UDP socket with a 4 Mbyte receive buffer.
> Same with the snmp trap daemon.
>
>
> Here's some links to performance graphs of the VM:
>  https://www.akips.com/downloads/hyperv-fbsd11.1p1/system-graphs-last2h.pdf
>  https://www.akips.com/downloads/hyperv-fbsd11.1p1/system-graphs-last24h.pdf
>  https://www.akips.com/downloads/hyperv-fbsd11.1p1/system-graphs-last7d.pdf
>
> The OS was upgraded to 11.1p1 at 5pm on the 5th Sep.  The hn0 interface hung
> at 7:36pm.  The interface hung three times before we reverted to 11.0p9.  It
> takes a few hours after rebooting the VM before the interface hangs.
>
>
> Microsoft Host is running Windows 2012 R2.  Waiting for patch level info from
> the customer.
>
> I'll have to get the customer to spin up a new VM before trying your patch.
>
>
> Here's some info (after a reboot of the VM)
>
> Guest VM dmesg:
>
> FreeBSD 11.1-RELEASE-p1 #0 r322350: Thu Aug 10 22:16:21 UTC 2017
>     [hidden email]:/usr/obj/usr/src/sys/GENERIC amd64
> FreeBSD clang version 4.0.0 (tags/RELEASE_400/final 297347) (based on LLVM
> 4.0.0)
> VT(vga): text 80x25
> Hyper-V Version: 6.3.9600 [SP18]
>   Features=0xe7f<VPRUNTIME,TMREFCNT,SYNIC,SYNTM,APIC,HYPERCALL,VPINDEX,REFTSC,IDLE,TMFREQ>
>   PM Features=0x0 [C2]
>   Features3=0x7b2<DEBUG,XMMHC,IDLE,NUMA,TMFREQ,SYNCMC,CRASH>
> Timecounter "Hyper-V" frequency 10000000 Hz quality 2000
> CPU: Intel(R) Xeon(R) CPU E5-2695 v3 @ 2.30GHz (2300.00-MHz K8-class CPU)
>   Origin="GenuineIntel"  Id=0x306f2  Family=0x6  Model=0x3f  Stepping=2
>   Features=0x1f83fbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,MMX,FXSR,SSE,SSE2,SS,HTT>
>   Features2=0x80002001<SSE3,CX16,HV>
>   AMD Features=0x20100800<SYSCALL,NX,LM>
>   AMD Features2=0x1<LAHF>
> Hypervisor: Origin = "Microsoft Hv"
> real memory  = 34359738368 (32768 MB)
> avail memory = 33325903872 (31782 MB)
> Event timer "LAPIC" quality 100
> ACPI APIC Table: <VRTUAL MICROSFT>
> FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs
> FreeBSD/SMP: 1 package(s) x 4 core(s)
> random: unblocking device.
> ioapic0: Changing APIC ID to 0
> ioapic0 <Version 1.1> irqs 0-23 on motherboard
> SMP: AP CPU #1 Launched!
> SMP: AP CPU #3 Launched!
> SMP: AP CPU #2 Launched!
> Timecounter "Hyper-V-TSC" frequency 10000000 Hz quality 3000
> random: entropy device external interface
> kbd1 at kbdmux0
> netmap: loaded module
> module_register_init: MOD_LOAD (vesa, 0xffffffff80f5b220, 0) error 19
> nexus0
> vtvga0: <VT VGA driver> on motherboard
> cryptosoft0: <software crypto> on motherboard
> acpi0: <VRTUAL MICROSFT> on motherboard
> acpi0: Power Button (fixed)
> cpu0: <ACPI CPU> on acpi0
> cpu1: <ACPI CPU> on acpi0
> cpu2: <ACPI CPU> on acpi0
> cpu3: <ACPI CPU> on acpi0
> attimer0: <AT timer> port 0x40-0x43 irq 0 on acpi0
> Timecounter "i8254" frequency 1193182 Hz quality 0
> Event timer "i8254" frequency 1193182 Hz quality 100
> atrtc0: <AT realtime clock> port 0x70-0x71 irq 8 on acpi0
> Event timer "RTC" frequency 32768 Hz quality 0
> Timecounter "ACPI-fast" frequency 3579545 Hz quality 900
> acpi_timer0: <32-bit timer at 3.579545MHz> port 0x408-0x40b on acpi0
> pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0
> vmbus0: <Hyper-V Vmbus> on pcib0
> pci0: <ACPI PCI bus> on pcib0
> isab0: <PCI-ISA bridge> at device 7.0 on pci0
> isa0: <ISA bus> on isab0
> atapci0: <Intel PIIX4 UDMA33 controller> port
> 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xffa0-0xffaf at device 7.1 on pci0
> ata0: <ATA channel> at channel 0 on atapci0
> ata1: <ATA channel> at channel 1 on atapci0
> pci0: <bridge> at device 7.3 (no driver attached)
> vgapci0: <VGA-compatible display> mem 0xf8000000-0xfbffffff irq 11 at device
> 8.0 on pci0
> vgapci0: Boot video device
> atkbdc0: <Keyboard controller (i8042)> port 0x60,0x64 irq 1 on acpi0
> atkbd0: <AT Keyboard> irq 1 on atkbdc0
> kbd0 at atkbd0
> atkbd0: [GIANT-LOCKED]
> psm0: <PS/2 Mouse> irq 12 on atkbdc0
> psm0: [GIANT-LOCKED]
> psm0: model IntelliMouse Explorer, device ID 4
> uart0: <16550 or compatible> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0
> uart1: <16550 or compatible> port 0x2f8-0x2ff irq 3 on acpi0
> fdc0: <floppy drive controller (FDE)> port 0x3f0-0x3f5,0x3f7 irq 6 drq 2 on
> acpi0
> fd0: <1440-KB 3.5" drive> on fdc0 drive 0
> vmbus_res0: <Hyper-V Vmbus Resource> irq 5,7 on acpi0
> orm0: <ISA Option ROM> at iomem 0xc0000-0xcbfff on isa0
> vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0
> ppc0: cannot reserve I/O port range
> ZFS filesystem version: 5
> ZFS storage pool version: features support (5000)
> Timecounters tick every 1.000 msec
> usb_needs_explore_all: no devclass
> nvme cam probe device init
> vmbus0: version 3.0
> hvet0: <Hyper-V event timer> on vmbus0
> Event timer "Hyper-V" frequency 10000000 Hz quality 1000
> storvsc0: <Hyper-V IDE> on vmbus0
> hvkbd0: <Hyper-V KBD> on vmbus0
> hvheartbeat0: <Hyper-V Heartbeat> on vmbus0
> hvkvp0: <Hyper-V KVP> on vmbus0
> hvshutdown0: <Hyper-V Shutdown> on vmbus0
> hvvss0: <Hyper-V VSS> on vmbus0
> storvsc1: <Hyper-V SCSI> on vmbus0
> hn0: <Hyper-V Network Interface> on vmbus0
> da0 at blkvsc0 bus 0 scbus2 target 1 lun 0
> da0: <Msft Virtual Disk 1.0> Fixed Direct Access SPC-3 SCSI device
> da0: 300.000MB/s transfers
> da0: Command Queueing enabled
> da0: 307200MB (629145600 512 byte sectors)
> cd0 at ata1 bus 0 scbus1 target 0 lun 0
> cd0: <Msft Virtual CD/ROM 1.0> Removable CD-ROM SPC-3 SCSI device
> cd0: 16.700MB/s transfers (WDMA2, ATAPI 12bytes, PIO 65534bytes)
> cd0: Attempt to query device size failed: NOT READY, Medium not present
> hn0: Ethernet address: 00:15:5d:31:21:0f
> hn0: link state changed to UP
> Trying to mount root from ufs:/dev/gpt/akips-root1 [rw]...
> Setting hostuuid: 2b0f0733-401e-7646-9724-a89072a2f005.
> Setting hostid: 0xd87511b7.
> Starting file system checks:
> /dev/gpt/akips-root1: FILE SYSTEM CLEAN; SKIPPING CHECKS
> /dev/gpt/akips-root1: clean, 961795 free (227 frags, 120196 blocks, 0.0%
> fragmentation)
> /dev/gpt/akips-root0: FILE SYSTEM CLEAN; SKIPPING CHECKS
> /dev/gpt/akips-root0: clean, 977258 free (146 frags, 122139 blocks, 0.0%
> fragmentation)
> Mounting local filesystems:.
> ELF ldconfig
> path: /lib /usr/lib /usr/lib/compat /usr/local/lib /usr/local/lib/perl5/5.24/mach/CORE
> 32-bit compatibility ldconfig path: /usr/lib32
> Setting hostname: xxxxxx
> Setting up harvesting:
> [UMA],[FS_ATIME],SWI,INTERRUPT,NET_NG,NET_ETHER,NET_TUN,MOUSE,KEYBOARD,ATTACH,CACHED
> Feeding entropy: .
> Limiting icmp unreach response from 358 to 200 packets/sec
> Limiting icmp unreach response from 464 to 200 packets/sec
> Starting Network: lo0 hn0.
> lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384
>         options=600003<RXCSUM,TXCSUM,RXCSUM_IPV6,TXCSUM_IPV6>
>         inet6 ::1 prefixlen 128
>         inet6 fe80::1%lo0 prefixlen 64 scopeid 0x1
>         inet 127.0.0.1 netmask 0xff000000
>         nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
>         groups: lo
> hn0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
>         options=51b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,TSO4,LRO>
>         ether 00:15:5d:31:21:0f
>         hwaddr 00:15:5d:31:21:0f
>         inet 172.31.5.13 netmask 0xffffff00 broadcast 172.31.5.255
>         nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
>         media: Ethernet autoselect (10Gbase-T <full-duplex>)
>         status: active
> Starting devd.
> add host 127.0.0.1: gateway lo0 fib 0: route already in table
> add net default: gateway 172.31.5.254
> add host ::1: gateway lo0 fib 0: route already in table
> add net fe80::: gateway ::1
> add net ff02::: gateway ::1
> add net ::ffff:0.0.0.0: gateway ::1
> add net ::0.0.0.0: gateway ::1
> Creating and/or trimming log files.
> Starting syslogd.
> Setting date via ntp.
> Limiting icmp unreach response from 398 to 200 packets/sec
> Limiting icmp unreach response from 350 to 200 packets/sec
> Limiting icmp unreach response from 333 to 200 packets/sec
> Limiting icmp unreach response from 501 to 200 packets/sec
> Limiting icmp unreach response from 336 to 200 packets/sec
> Limiting icmp unreach response from 351 to 200 packets/sec
> No core dumps found.
> Clearing /tmp (X related).
> Updating motd:.
> Mounting late filesystems:.
> Starting ntpd.
> Limiting icmp unreach response from 424 to 200 packets/sec
> postfix/postfix-script: starting the Postfix mail system
> vfs.zfs.zfetch.max_distance: 8388608 -> 67108864
> vfs.usermount: 0 -> 1
> net.inet.tcp.sendbuf_inc: 8192 -> 16384
> net.inet.tcp.recvbuf_inc: 16384 -> 524288
> net.inet.tcp.keepidle: 7200000 -> 60000
> net.inet.tcp.keepintvl: 75000 -> 30000
> net.inet.tcp.keepinit: 75000 -> 30000
> net.inet.udp.blackhole: 0 -> 1
> net.inet.tcp.blackhole: 0 -> 1
> net.inet.sctp.blackhole: 0 -> 1
> net.link.ether.inet.log_arp_movements: 1 -> 0
> kern.init_shutdown_timeout: 120 -> 300
> root: Clearing temporary files
> root: Clearing core files
> root: Clearing state machines
> root: Prefetching database files
> root: Starting time daemon
> 2017-09-06 09:03:46 nm-timed common.c:6032 DEBUG: Create fmap tmap 1032 bytes
> root: Starting joat daemon
> root: Starting watchdog
> 2017-09-06 09:03:46 nm-joatd common.c:5802 INFO: Start
> Sep  6 09:03:46 bull nm-joatd: Start
> Configuring vt: blanktime.
> Performing sanity check on sshd configuration.
> Starting sshd.
> Starting cron.
> --------------------------------------------------------------------
>
>
> ifconfig:
>
> lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384
>         options=600003<RXCSUM,TXCSUM,RXCSUM_IPV6,TXCSUM_IPV6>
>         inet6 ::1 prefixlen 128
>         inet6 fe80::1%lo0 prefixlen 64 scopeid 0x1
>         inet 127.0.0.1 netmask 0xff000000
>         nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
>         groups: lo
> hn0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
>         options=51b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,TSO4,LRO>
>         ether 00:15:5d:31:21:0f
>         hwaddr 00:15:5d:31:21:0f
>         inet 172.31.5.13 netmask 0xffffff00 broadcast 172.31.5.255
>         nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
>         media: Ethernet autoselect (10Gbase-T <full-duplex>)
>         status: active
> ------------------------------------------------------------------------
>
>
> Some sysctl info for the hn device
>
> hw.hn.tx_agg_pkts: -1
> hw.hn.tx_agg_size: -1
> hw.hn.lro_mbufq_depth: 512
> hw.hn.tx_swq_depth: 0
> hw.hn.tx_ring_cnt: 0
> hw.hn.chan_cnt: 0
> hw.hn.use_if_start: 0
> hw.hn.use_txdesc_bufring: 1
> hw.hn.tx_taskq_mode: 0
> hw.hn.tx_taskq_cnt: 1
> hw.hn.lro_entry_count: 128
> hw.hn.direct_tx_size: 128
> hw.hn.tx_chimney_size: 0
> hw.hn.tso_maxlen: 65535
> hw.hn.trust_hostip: 1
> hw.hn.trust_hostudp: 1
> hw.hn.trust_hosttcp: 1
> dev.hn.0.vf:
> dev.hn.0.polling: 0
> dev.hn.0.agg_pkts: -1
> dev.hn.0.agg_size: -1
> dev.hn.0.rndis_agg_align: 8
> dev.hn.0.rndis_agg_pkts: 8
> dev.hn.0.rndis_agg_size: 4026531839
> dev.hn.0.rss_ind_size: 128
> dev.hn.0.rss_hash: 1701<TOEPLITZ,IP4,TCP4,IP6,TCP6>
> dev.hn.0.rxfilter: d<DIRECT,ALLMULTI,BROADCAST>
> dev.hn.0.hwassist: 17<CSUM_IP,CSUM_IP_UDP,CSUM_IP_TCP,CSUM_IP_TSO>
> dev.hn.0.caps:
> 3ff<VLAN,MTU,IPCS,TCP4CS,TCP6CS,UDP4CS,UDP6CS,TSO4,TSO6,HASHVAL>
> dev.hn.0.ndis_version: 6.30 dev.hn.0.nvs_version: 327680
> dev.hn.0.channel.13.sub.3.br.tx.state: rindex:122136 windex:122136 imask:0
> ravail:0 wavail:520192 dev.hn.0.channel.13.sub.3.br.rx.state: rindex:157368
> windex:157368 imask:0 ravail:0 wavail:520192 dev.hn.0.channel.13.sub.3.mnf: 1
> dev.hn.0.channel.13.sub.3.cpu: 3
> dev.hn.0.channel.13.sub.3.chanid: 16
> dev.hn.0.channel.13.sub.2.br.tx.state: rindex:116032 windex:116032 imask:0
> ravail:0 wavail:520192 dev.hn.0.channel.13.sub.2.br.rx.state: rindex:150096
> windex:150096 imask:0 ravail:0 wavail:520192 dev.hn.0.channel.13.sub.2.mnf: 1
> dev.hn.0.channel.13.sub.2.cpu: 2
> dev.hn.0.channel.13.sub.2.chanid: 15
> dev.hn.0.channel.13.sub.1.br.tx.state: rindex:236264 windex:236264 imask:0
> ravail:0 wavail:520192 dev.hn.0.channel.13.sub.1.br.rx.state: rindex:304448
> windex:304448 imask:0 ravail:0 wavail:520192 dev.hn.0.channel.13.sub.1.mnf: 1
> dev.hn.0.channel.13.sub.1.cpu: 1
> dev.hn.0.channel.13.sub.1.chanid: 14
> dev.hn.0.channel.13.br.tx.state: rindex:147520 windex:147520 imask:0 ravail:0
> wavail:520192 dev.hn.0.channel.13.br.rx.state: rindex:167776 windex:167776
> imask:0 ravail:0 wavail:520192 dev.hn.0.channel.13.mnf: 1
> dev.hn.0.channel.13.cpu: 0
> dev.hn.0.rx_ring_inuse: 4
> dev.hn.0.rx_ring_cnt: 4
> dev.hn.0.rx_ack_failed: 0
> dev.hn.0.small_pkts: 538
> dev.hn.0.csum_trusted: 0
> dev.hn.0.csum_udp: 8734
> dev.hn.0.csum_tcp: 18
> dev.hn.0.csum_ip: 9758
> dev.hn.0.trust_hostip: 1
> dev.hn.0.trust_hostudp: 1
> dev.hn.0.trust_hosttcp: 1
> dev.hn.0.lro_ackcnt_lim: 2
> dev.hn.0.lro_length_lim: 18000
> dev.hn.0.lro_tried: 18
> dev.hn.0.lro_flushed: 18
> dev.hn.0.lro_queued: 18
> dev.hn.0.rx.3.pktbuf_len: 16384
> dev.hn.0.rx.3.rss_pkts: 2223
> dev.hn.0.rx.3.packets: 2223
> dev.hn.0.rx.2.pktbuf_len: 16384
> dev.hn.0.rx.2.rss_pkts: 2186
> dev.hn.0.rx.2.packets: 2186
> dev.hn.0.rx.1.pktbuf_len: 16384
> dev.hn.0.rx.1.rss_pkts: 4323
> dev.hn.0.rx.1.packets: 4323
> dev.hn.0.rx.0.pktbuf_len: 16384
> dev.hn.0.rx.0.rss_pkts: 1026
> dev.hn.0.rx.0.packets: 1275
> dev.hn.0.agg_align: 8
> dev.hn.0.agg_pktmax: 8
> dev.hn.0.agg_szmax: 6144
> dev.hn.0.tx_ring_inuse: 4
> dev.hn.0.tx_ring_cnt: 4
> dev.hn.0.sched_tx: 1
> dev.hn.0.direct_tx_size: 128
> dev.hn.0.tx_chimney_size: 6144
> dev.hn.0.tx_chimney_max: 6144
> dev.hn.0.txdesc_cnt: 512
> dev.hn.0.tx_chimney_tried: 1345
> dev.hn.0.tx_chimney: 1345
> dev.hn.0.tx_collapsed: 0
> dev.hn.0.agg_flush_failed: 0
> dev.hn.0.txdma_failed: 0
> dev.hn.0.send_failed: 0
> dev.hn.0.no_txdescs: 0
> dev.hn.0.tx.3.sends: 0
> dev.hn.0.tx.3.packets: 0
> dev.hn.0.tx.3.oactive: 0
> dev.hn.0.tx.2.sends: 0
> dev.hn.0.tx.2.packets: 0
> dev.hn.0.tx.2.oactive: 0
> dev.hn.0.tx.1.sends: 19
> dev.hn.0.tx.1.packets: 19
> dev.hn.0.tx.1.oactive: 0
> dev.hn.0.tx.0.sends: 1326
> dev.hn.0.tx.0.packets: 1425
> dev.hn.0.tx.0.oactive: 0
> dev.hn.0.%parent: vmbus0
> dev.hn.0.%pnpinfo: classid=f8615163-df3e-46c5-913f-f2d2f965ed0e
> deviceid=503f1ad8-7dad-4d5f-9f16-6c4383e28d12 dev.hn.0.%location:
> dev.hn.0.%driver: hn
> dev.hn.0.%desc: Hyper-V Network Interface
> dev.hn.%parent:
> ------------------------------------------------------------
>



--
Tomorrow Will Never Die
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: 11.1 running on HyperV hn interface hangs

Paul Koch-3
On Thu, 7 Sep 2017 13:51:11 +0800
Sepherosa Ziehau <[hidden email]> wrote:

> Weird, your traffic pattern does not even belong to anything heavy.
> Sending is mainly UDP, which will never be able to saturate the TX
> buffer ring causing the RXBUF ACK sending failure.  This is weird.

It's a bit tricky. The poller is very fast. We ping every device every 15
seconds, and collect every MIB object every 60 seconds. The poller "rate
limits" itself by dividing each minute into 100ms time slots and only sends a
specific amount of pings/snmp packets in each time slot.  The problem is, it
blasts the request packets out really fast at the start of each time slot,
and then sits in a receive loop until the next time slot comes around.  The
requests are not paced over the 100ms, therefore it will blast out a lot
of packets in a few milliseconds.

We use to use a 1 second rate limiting time slot, and didn't interlace
ping/snmp requests, but we found certain interface types on Cisco 6509
switches couldn't keep up with back-to-back pings and would lose them.


> Anyhow, make sure to test this patch:
> 8762017-Sep-07 02:19 hn_inc_txbr.diff

Yep.  Might take a bit of time to test though because we'll need to get the
customer to spin up a test VM on the same platform, and they are fairly
remote (Perth, Australia).  We don't run any Microsoft servers/HyperV setups
in our lab.

        Paul.
--
Paul Koch | Founder | CEO
AKIPS Network Monitor | akips.com
Brisbane, Australia
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: 11.1 running on HyperV hn interface hangs

Sepherosa Ziehau-2
If you have any updates on this, please let me know.  There is still
time for 10.4.

On Thu, Sep 7, 2017 at 11:04 PM, Paul Koch <[hidden email]> wrote:

> On Thu, 7 Sep 2017 13:51:11 +0800
> Sepherosa Ziehau <[hidden email]> wrote:
>
>> Weird, your traffic pattern does not even belong to anything heavy.
>> Sending is mainly UDP, which will never be able to saturate the TX
>> buffer ring causing the RXBUF ACK sending failure.  This is weird.
>
> It's a bit tricky. The poller is very fast. We ping every device every 15
> seconds, and collect every MIB object every 60 seconds. The poller "rate
> limits" itself by dividing each minute into 100ms time slots and only sends a
> specific amount of pings/snmp packets in each time slot.  The problem is, it
> blasts the request packets out really fast at the start of each time slot,
> and then sits in a receive loop until the next time slot comes around.  The
> requests are not paced over the 100ms, therefore it will blast out a lot
> of packets in a few milliseconds.
>
> We use to use a 1 second rate limiting time slot, and didn't interlace
> ping/snmp requests, but we found certain interface types on Cisco 6509
> switches couldn't keep up with back-to-back pings and would lose them.
>
>
>> Anyhow, make sure to test this patch:
>> 8762017-Sep-07 02:19 hn_inc_txbr.diff
>
> Yep.  Might take a bit of time to test though because we'll need to get the
> customer to spin up a test VM on the same platform, and they are fairly
> remote (Perth, Australia).  We don't run any Microsoft servers/HyperV setups
> in our lab.
>
>         Paul.
> --
> Paul Koch | Founder | CEO
> AKIPS Network Monitor | akips.com
> Brisbane, Australia



--
Tomorrow Will Never Die
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: 11.1 running on HyperV hn interface hangs

Paul Koch-3
On Thu, 14 Sep 2017 09:54:56 +0800
Sepherosa Ziehau <[hidden email]> wrote:

> If you have any updates on this, please let me know.  There is still
> time for 10.4.

Still working on it.  We are trying to replicate the FreeBSD 11.1
running in a Hyper-V VM setup in our test lab.  We have ping/snmp/netflow
network simulators that can create large amounts of real network traffic to
see if it reliably triggers the problem.

        Paul.
--
Paul Koch | Founder | CEO
AKIPS Network Monitor | akips.com
Brisbane, Australia
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[hidden email]"