Modbus TCP bug with Fx30 LPWA

Hello,

We are testing modbus service with two TCP servers and it seems after few hours of running time, the ethernet interface freezes and modbus service cannot connect to it anymore. The only solution for it is to reboot the unit which is not acceptable at all.

Sep 21 22:27:26 fx30 user.err Legato: =ERR= | modbus[1607]/components T=main | mbrModbus.c mbr_ConnStateAction_CONNECTING() 293 | Failed to connect to eth0 (Operation now in progress)
Sep 21 22:27:28 fx30 user.err Legato: =ERR= | modbus[1607]/components T=main | mbrModbus.c mbr_ConnStateAction_CONNECTING() 293 | Failed to connect to eth0 (Operation now in progress)
Sep 21 22:27:31 fx30 user.err Legato: =ERR= | modbus[1607]/components T=main | mbrModbus.c mbr_ConnStateAction_CONNECTING() 293 | Failed to connect to eth0 (Operation now in progress)
Sep 21 22:27:34 fx30 user.err Legato: =ERR= | modbus[1607]/components T=main | mbrModbus.c mbr_ConnStateAction_CONNECTING() 293 | Failed to connect to eth0 (Operation now in progress)
Sep 21 22:27:37 fx30 user.err Legato: =ERR= | modbus[1607]/components T=main | mbrModbus.c mbr_ConnStateAction_CONNECTING() 293 | Failed to connect to eth0 (Operation now in progress)
Sep 21 22:27:39 fx30 user.err Legato: =ERR= | modbus[1607]/components T=main | mbrModbus.c mbr_ConnStateAction_CONNECTING() 293 | Failed to connect to eth0 (Operation now in progress)
Sep 21 22:27:42 fx30 user.err Legato: =ERR= | modbus[1607]/components T=main | mbrModbus.c mbr_ConnStateAction_CONNECTING() 293 | Failed to connect to eth0 (Operation now in progress)
Sep 21 22:27:45 fx30 user.err Legato: =ERR= | modbus[1607]/components T=main | mbrModbus.c mbr_ConnStateAction_CONNECTING() 293 | Failed to connect to eth0 (Operation now in progress)
Sep 21 22:27:48 fx30 user.err Legato: =ERR= | modbus[1607]/components T=main | mbrModbus.c mbr_ConnStateAction_CONNECTING() 293 | Failed to connect to eth0 (Operation now in progress)
Sep 21 22:27:50 fx30 user.err Legato: =ERR= | modbus[1607]/components T=main | mbrModbus.c mbr_ConnStateAction_CONNECTING() 293 | Failed to connect to eth0 (Operation now in progress)
Sep 21 22:27:53 fx30 user.err Legato: =ERR= | modbus[1607]/components T=main | mbrModbus.c mbr_ConnStateAction_CONNECTING() 293 | Failed to connect to eth0 (Operation now in progress)

We are running firmware 3.5.0 for our tests.

Hi,

I’m looking into it and will keep you posted.

more info:

Well we left unit over night and after few hours it happened based on kernel logs seems it is due to a bug on USB ethernet chip driver and even though it gets recovered service does not know this and does not re-initilizes:

[ 3744.010938] smsc95xx 1-1.1:1.0 eth0: Failed to read reg index 0x00000114: -71
[ 3744.010962] smsc95xx 1-1.1:1.0 eth0: Error reading MII_ACCESS
[ 3744.010980] smsc95xx 1-1.1:1.0 eth0: MII is busy in smsc95xx_mdio_read
[ 3744.010998] smsc95xx 1-1.1:1.0 eth0: Failed to read MII_BMSR
[ 3744.708127] smsc95xx 1-1.1:1.0 eth0: unregister ‘smsc95xx’ usb-7c00000.hsic_host-1.1, smsc95xx USB 2.0 Ethernet
[ 3744.708334] smsc95xx 1-1.1:1.0 eth0: usbnet_stop
[ 3744.741494] [RMNET:HI] rmnet_config_notify_cb(): Kernel is trying to unregister eth0
[ 3744.760418] [RMNET:HI] rmnet_config_notify_cb(): Kernel is trying to unregister eth0

[ 3745.200348] smsc95xx 1-1.1:1.0 eth0: register ‘smsc95xx’ at usb-7c00000.hsic_host-1.1, smsc95xx USB 2.0 Ethernet, 22:2c:a6:be:49:ac
[ 3745.364265] smsc95xx 1-1.1:1.0 eth0: usbnet_open
[ 3745.364486] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
[ 3746.878999] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[ 3746.901524] smsc95xx 1-1.1:1.0 eth0: link up, 100Mbps, full-duplex, lpa 0xC1E1

So I manually gave eth0 the same ip which I set in modbus service after recovering [time stamp: 3745.200348] and service continued to work:

Ifconfig eth0 192.168.10.10

So your service needs to monitor the eth0 status and if down, wait for it to gets back up and re-assign the previous ip to it.

Hi,

I tried to reproduce with both USB connected and disconnected, and could not reproduce the issue : ethernet is stable and modbus messages are still received after 3hours.

Could you give us any information which could help ?

  • power supply
  • physical connections
  • steps (like plugging before power up, make some ssh commands, then unplug…)
  • anything you can think about :slight_smile:

Thank you,

Hello,

We tested with both lab bench PSU and industrial DIN rail mountable switching power supply so the issue is not related.

We have two modbus servers having 8 register banks each which we poll every 30 seconds. these two are connected to FX30 via an unmanaged switch (also we tested it with only one server directly without switch inbetween → same results).

client settings: ip: 192.168.10.10, netmask: /23
servers settings: ip: 192.168.10.11 and 192.168.10.12, server ID: 1

There is no USB cable attached during our tests.

BTW this issue happens quite randomly once after 20 min and another time after 4-5 hours.

It’s interesting to have an identical setup, so what do you use a slave simulator?

Yes basically I use a software slave which responds to any register request.

diagslave 3.4 - FieldTalk(tm) Modbus(R) Diagnostic Slave Simulator

As far as I can tell it has something to do with smsc95xx driver which suddently start to fail and it happened more than dozen times during our tests:

[ 129.531393] smsc95xx 1-1.1:1.0 eth0: Failed to read reg index 0x00000114: -71
[ 129.531437] smsc95xx 1-1.1:1.0 eth0: Error reading MII_ACCESS
[ 129.531454] smsc95xx 1-1.1:1.0 eth0: MII is busy in smsc95xx_mdio_read
[ 129.531473] smsc95xx 1-1.1:1.0 eth0: Failed to read MII_BMSR
[ 130.240964] usb 1-1.1: USB disconnect, device number 4
[ 130.244101] smsc95xx 1-1.1:1.0 eth0: unregister ‘smsc95xx’ usb-7c00000.hsic_host-1.1, smsc95xx USB 2.0 Ethernet
[ 130.244308] smsc95xx 1-1.1:1.0 eth0: usbnet_stop
[ 130.271450] [RMNET:HI] rmnet_config_notify_cb(): Kernel is trying to unregister eth0
[ 130.290435] [RMNET:HI] rmnet_config_notify_cb(): Kernel is trying to unregister eth0
[ 130.426127] USB QCMAP NL IOCTL Snd GETNEIGH Succ
[ 130.431027] QTI:LINK_DOWN message posted
[ 130.530236] usb 1-1.1: new high-speed USB device number 5 using msm_hsic_host
[ 130.641610] usb 1-1.1: New USB device found, idVendor=0424, idProduct=9e00
[ 130.641638] usb 1-1.1: New USB device strings: Mfr=0, Product=0, SerialNumber=0
[ 130.648366] smsc95xx v1.0.6
[ 130.740319] smsc95xx 1-1.1:1.0 eth0: register ‘smsc95xx’ at usb-7c00000.hsic_host-1.1, smsc95xx USB 2.0 Ethernet, 06:cb:88:5a:ea:e8
[ 130.883970] smsc95xx 1-1.1:1.0 eth0: usbnet_open
[ 130.884202] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
[ 130.902895] USB QCMAP NL IOCTL Snd GETNEIGH Succ
[ 132.366330] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[ 132.380338] smsc95xx 1-1.1:1.0 eth0: link up, 100Mbps, full-duplex, lpa 0x4DE1
[ 132.457301] QTI:ETH mode
[ 132.464548] QTI:LINK_UP message posted
[ 138.897587] QTI:LINK_UP Processed
[ 138.907197] QTI:LINK_DOWN Processed
[ 138.914584] QTI:LINK_UP Processed
[10464.810023] smsc95xx 1-1.1:1.0 eth0: Failed to read reg index 0x00000114: -71
[10464.810047] smsc95xx 1-1.1:1.0 eth0: Error reading MII_ACCESS
[10464.810065] smsc95xx 1-1.1:1.0 eth0: MII is busy in smsc95xx_mdio_read
[10464.810948] smsc95xx 1-1.1:1.0 eth0: Failed to read MII_BMSR
[10465.217648] usb 1-1.1: USB disconnect, device number 5
[10465.220965] smsc95xx 1-1.1:1.0 eth0: unregister ‘smsc95xx’ usb-7c00000.hsic_host-1.1, smsc95xx USB 2.0 Ethernet
[10465.221166] smsc95xx 1-1.1:1.0 eth0: usbnet_stop
[10465.247580] [RMNET:HI] rmnet_config_notify_cb(): Kernel is trying to unregister eth0
[10465.260409] [RMNET:HI] rmnet_config_notify_cb(): Kernel is trying to unregister eth0
[10465.392240] USB QCMAP NL IOCTL Snd GETNEIGH Succ
[10465.396885] QTI:LINK_DOWN message posted
[10465.410844] QTI:LINK_DOWN Processed
[10465.500429] usb 1-1.1: new high-speed USB device number 6 using msm_hsic_host
[10465.611496] usb 1-1.1: New USB device found, idVendor=0424, idProduct=9e00
[10465.611525] usb 1-1.1: New USB device strings: Mfr=0, Product=0, SerialNumber=0
[10465.618254] smsc95xx v1.0.6
[10465.687349] smsc95xx 1-1.1:1.0 eth0: register ‘smsc95xx’ at usb-7c00000.hsic_host-1.1, smsc95xx USB 2.0 Ethernet, 86:5d:a1:67:61:91
[10465.864121] smsc95xx 1-1.1:1.0 eth0: usbnet_open
[10465.864347] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
[10465.899169] USB QCMAP NL IOCTL Snd GETNEIGH Succ
[10467.344280] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[10467.352645] smsc95xx 1-1.1:1.0 eth0: link up, 100Mbps, full-duplex, lpa 0x4DE1
[10467.438651] QTI:ETH mode
[10467.444254] QTI:LINK_UP Processed
[10467.458560] QTI:LINK_UP message posted

The ethernet controller smsc95xx is a USB device, and I wonder if it could be related to some USB low power mode switching as I could see on other traces you sent by email.

How could I disable permanantly the USB low power switching on device to test?

Not sure it is possible through a shell command… but if you have the opportunity to run a long test (overnight ?) with USB cable plugged, it will force to keep high-power mode on USB controller. It could help to isolate the issue.

Actually we did this morning such a test by keeping a ssh connection session with a PC alive for few hours but again the issue appeared (logs above).

BTW we have forced device to operate on LTE mode only with: cm radio rat LTE

Ok that’s why I don’t see “low power mode” in the previous traces. And moreover I see the issue after only 2 minutes !

Even if I don’t see why rat LTE would have any impact, I’ll try with this setting.

dmesg_22020927_1421.txt (98.8 KB)

attached full log as well.

1 Like

After a new test of more than 16h without being able to reproduce, we internally suspect an hardware issue (bad connections of the PHY ? thermal issue ?). Please raise RMA with your supplier asking for a swap.