Modbus TCP bug with Fx30 LPWA

alireza.ramezani · September 26, 2022, 9:23am

Hello,

We are testing modbus service with two TCP servers and it seems after few hours of running time, the ethernet interface freezes and modbus service cannot connect to it anymore. The only solution for it is to reboot the unit which is not acceptable at all.

Sep 21 22:27:26 fx30 user.err Legato: =ERR= | modbus[1607]/components T=main | mbrModbus.c mbr_ConnStateAction_CONNECTING() 293 | Failed to connect to eth0 (Operation now in progress)
Sep 21 22:27:28 fx30 user.err Legato: =ERR= | modbus[1607]/components T=main | mbrModbus.c mbr_ConnStateAction_CONNECTING() 293 | Failed to connect to eth0 (Operation now in progress)
Sep 21 22:27:31 fx30 user.err Legato: =ERR= | modbus[1607]/components T=main | mbrModbus.c mbr_ConnStateAction_CONNECTING() 293 | Failed to connect to eth0 (Operation now in progress)
Sep 21 22:27:34 fx30 user.err Legato: =ERR= | modbus[1607]/components T=main | mbrModbus.c mbr_ConnStateAction_CONNECTING() 293 | Failed to connect to eth0 (Operation now in progress)
Sep 21 22:27:37 fx30 user.err Legato: =ERR= | modbus[1607]/components T=main | mbrModbus.c mbr_ConnStateAction_CONNECTING() 293 | Failed to connect to eth0 (Operation now in progress)
Sep 21 22:27:39 fx30 user.err Legato: =ERR= | modbus[1607]/components T=main | mbrModbus.c mbr_ConnStateAction_CONNECTING() 293 | Failed to connect to eth0 (Operation now in progress)
Sep 21 22:27:42 fx30 user.err Legato: =ERR= | modbus[1607]/components T=main | mbrModbus.c mbr_ConnStateAction_CONNECTING() 293 | Failed to connect to eth0 (Operation now in progress)
Sep 21 22:27:45 fx30 user.err Legato: =ERR= | modbus[1607]/components T=main | mbrModbus.c mbr_ConnStateAction_CONNECTING() 293 | Failed to connect to eth0 (Operation now in progress)
Sep 21 22:27:48 fx30 user.err Legato: =ERR= | modbus[1607]/components T=main | mbrModbus.c mbr_ConnStateAction_CONNECTING() 293 | Failed to connect to eth0 (Operation now in progress)
Sep 21 22:27:50 fx30 user.err Legato: =ERR= | modbus[1607]/components T=main | mbrModbus.c mbr_ConnStateAction_CONNECTING() 293 | Failed to connect to eth0 (Operation now in progress)
Sep 21 22:27:53 fx30 user.err Legato: =ERR= | modbus[1607]/components T=main | mbrModbus.c mbr_ConnStateAction_CONNECTING() 293 | Failed to connect to eth0 (Operation now in progress)

We are running firmware 3.5.0 for our tests.

gbetous · September 26, 2022, 12:59pm

Hi,

I’m looking into it and will keep you posted.

alireza.ramezani · September 26, 2022, 1:55pm

more info:

Well we left unit over night and after few hours it happened based on kernel logs seems it is due to a bug on USB ethernet chip driver and even though it gets recovered service does not know this and does not re-initilizes:

[ 3744.010938] smsc95xx 1-1.1:1.0 eth0: Failed to read reg index 0x00000114: -71
[ 3744.010962] smsc95xx 1-1.1:1.0 eth0: Error reading MII_ACCESS
[ 3744.010980] smsc95xx 1-1.1:1.0 eth0: MII is busy in smsc95xx_mdio_read
[ 3744.010998] smsc95xx 1-1.1:1.0 eth0: Failed to read MII_BMSR
[ 3744.708127] smsc95xx 1-1.1:1.0 eth0: unregister ‘smsc95xx’ usb-7c00000.hsic_host-1.1, smsc95xx USB 2.0 Ethernet
[ 3744.708334] smsc95xx 1-1.1:1.0 eth0: usbnet_stop
[ 3744.741494] [RMNET:HI] rmnet_config_notify_cb(): Kernel is trying to unregister eth0
[ 3744.760418] [RMNET:HI] rmnet_config_notify_cb(): Kernel is trying to unregister eth0

[ 3745.200348] smsc95xx 1-1.1:1.0 eth0: register ‘smsc95xx’ at usb-7c00000.hsic_host-1.1, smsc95xx USB 2.0 Ethernet, 22:2c:a6:be:49:ac
[ 3745.364265] smsc95xx 1-1.1:1.0 eth0: usbnet_open
[ 3745.364486] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
[ 3746.878999] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[ 3746.901524] smsc95xx 1-1.1:1.0 eth0: link up, 100Mbps, full-duplex, lpa 0xC1E1

So I manually gave eth0 the same ip which I set in modbus service after recovering [time stamp: 3745.200348] and service continued to work:

Ifconfig eth0 192.168.10.10

So your service needs to monitor the eth0 status and if down, wait for it to gets back up and re-assign the previous ip to it.

gbetous · September 27, 2022, 12:38pm

Hi,

I tried to reproduce with both USB connected and disconnected, and could not reproduce the issue : ethernet is stable and modbus messages are still received after 3hours.

Could you give us any information which could help ?

power supply
physical connections
steps (like plugging before power up, make some ssh commands, then unplug…)
anything you can think about

Thank you,

alireza.ramezani · September 27, 2022, 1:11pm

Hello,

We tested with both lab bench PSU and industrial DIN rail mountable switching power supply so the issue is not related.

We have two modbus servers having 8 register banks each which we poll every 30 seconds. these two are connected to FX30 via an unmanaged switch (also we tested it with only one server directly without switch inbetween → same results).

client settings: ip: 192.168.10.10, netmask: /23
servers settings: ip: 192.168.10.11 and 192.168.10.12, server ID: 1

There is no USB cable attached during our tests.

BTW this issue happens quite randomly once after 20 min and another time after 4-5 hours.

It’s interesting to have an identical setup, so what do you use a slave simulator?

gbetous · September 27, 2022, 1:16pm

Yes basically I use a software slave which responds to any register request.

diagslave 3.4 - FieldTalk(tm) Modbus(R) Diagnostic Slave Simulator

alireza.ramezani · September 27, 2022, 1:25pm

As far as I can tell it has something to do with smsc95xx driver which suddently start to fail and it happened more than dozen times during our tests:

[ 129.531393] smsc95xx 1-1.1:1.0 eth0: Failed to read reg index 0x00000114: -71
[ 129.531437] smsc95xx 1-1.1:1.0 eth0: Error reading MII_ACCESS
[ 129.531454] smsc95xx 1-1.1:1.0 eth0: MII is busy in smsc95xx_mdio_read
[ 129.531473] smsc95xx 1-1.1:1.0 eth0: Failed to read MII_BMSR
[ 130.240964] usb 1-1.1: USB disconnect, device number 4
[ 130.244101] smsc95xx 1-1.1:1.0 eth0: unregister ‘smsc95xx’ usb-7c00000.hsic_host-1.1, smsc95xx USB 2.0 Ethernet
[ 130.244308] smsc95xx 1-1.1:1.0 eth0: usbnet_stop
[ 130.271450] [RMNET:HI] rmnet_config_notify_cb(): Kernel is trying to unregister eth0
[ 130.290435] [RMNET:HI] rmnet_config_notify_cb(): Kernel is trying to unregister eth0
[ 130.426127] USB QCMAP NL IOCTL Snd GETNEIGH Succ
[ 130.431027] QTI:LINK_DOWN message posted
[ 130.530236] usb 1-1.1: new high-speed USB device number 5 using msm_hsic_host
[ 130.641610] usb 1-1.1: New USB device found, idVendor=0424, idProduct=9e00
[ 130.641638] usb 1-1.1: New USB device strings: Mfr=0, Product=0, SerialNumber=0
[ 130.648366] smsc95xx v1.0.6
[ 130.740319] smsc95xx 1-1.1:1.0 eth0: register ‘smsc95xx’ at usb-7c00000.hsic_host-1.1, smsc95xx USB 2.0 Ethernet, 06:cb:88:5a:ea:e8
[ 130.883970] smsc95xx 1-1.1:1.0 eth0: usbnet_open
[ 130.884202] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
[ 130.902895] USB QCMAP NL IOCTL Snd GETNEIGH Succ
[ 132.366330] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[ 132.380338] smsc95xx 1-1.1:1.0 eth0: link up, 100Mbps, full-duplex, lpa 0x4DE1
[ 132.457301] QTI:ETH mode
[ 132.464548] QTI:LINK_UP message posted
[ 138.897587] QTI:LINK_UP Processed
[ 138.907197] QTI:LINK_DOWN Processed
[ 138.914584] QTI:LINK_UP Processed
[10464.810023] smsc95xx 1-1.1:1.0 eth0: Failed to read reg index 0x00000114: -71
[10464.810047] smsc95xx 1-1.1:1.0 eth0: Error reading MII_ACCESS
[10464.810065] smsc95xx 1-1.1:1.0 eth0: MII is busy in smsc95xx_mdio_read
[10464.810948] smsc95xx 1-1.1:1.0 eth0: Failed to read MII_BMSR
[10465.217648] usb 1-1.1: USB disconnect, device number 5
[10465.220965] smsc95xx 1-1.1:1.0 eth0: unregister ‘smsc95xx’ usb-7c00000.hsic_host-1.1, smsc95xx USB 2.0 Ethernet
[10465.221166] smsc95xx 1-1.1:1.0 eth0: usbnet_stop
[10465.247580] [RMNET:HI] rmnet_config_notify_cb(): Kernel is trying to unregister eth0
[10465.260409] [RMNET:HI] rmnet_config_notify_cb(): Kernel is trying to unregister eth0
[10465.392240] USB QCMAP NL IOCTL Snd GETNEIGH Succ
[10465.396885] QTI:LINK_DOWN message posted
[10465.410844] QTI:LINK_DOWN Processed
[10465.500429] usb 1-1.1: new high-speed USB device number 6 using msm_hsic_host
[10465.611496] usb 1-1.1: New USB device found, idVendor=0424, idProduct=9e00
[10465.611525] usb 1-1.1: New USB device strings: Mfr=0, Product=0, SerialNumber=0
[10465.618254] smsc95xx v1.0.6
[10465.687349] smsc95xx 1-1.1:1.0 eth0: register ‘smsc95xx’ at usb-7c00000.hsic_host-1.1, smsc95xx USB 2.0 Ethernet, 86:5d:a1:67:61:91
[10465.864121] smsc95xx 1-1.1:1.0 eth0: usbnet_open
[10465.864347] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
[10465.899169] USB QCMAP NL IOCTL Snd GETNEIGH Succ
[10467.344280] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[10467.352645] smsc95xx 1-1.1:1.0 eth0: link up, 100Mbps, full-duplex, lpa 0x4DE1
[10467.438651] QTI:ETH mode
[10467.444254] QTI:LINK_UP Processed
[10467.458560] QTI:LINK_UP message posted

gbetous · September 27, 2022, 1:29pm

The ethernet controller smsc95xx is a USB device, and I wonder if it could be related to some USB low power mode switching as I could see on other traces you sent by email.

alireza.ramezani · September 27, 2022, 1:31pm

How could I disable permanantly the USB low power switching on device to test?

gbetous · September 27, 2022, 1:42pm

Not sure it is possible through a shell command… but if you have the opportunity to run a long test (overnight ?) with USB cable plugged, it will force to keep high-power mode on USB controller. It could help to isolate the issue.

alireza.ramezani · September 27, 2022, 1:44pm

Actually we did this morning such a test by keeping a ssh connection session with a PC alive for few hours but again the issue appeared (logs above).

BTW we have forced device to operate on LTE mode only with: cm radio rat LTE

gbetous · September 27, 2022, 1:48pm

Ok that’s why I don’t see “low power mode” in the previous traces. And moreover I see the issue after only 2 minutes !

Even if I don’t see why rat LTE would have any impact, I’ll try with this setting.

alireza.ramezani · September 27, 2022, 1:51pm

dmesg_22020927_1421.txt (98.8 KB)

attached full log as well.

gbetous · September 28, 2022, 9:29am

After a new test of more than 16h without being able to reproduce, we internally suspect an hardware issue (bad connections of the PHY ? thermal issue ?). Please raise RMA with your supplier asking for a swap.

Topic		Replies	Views
Modbus RTU with Fx30s LPWA Octave modbus	6	707	September 26, 2022
Modbus error/exception codes FX30(s) Octave FX30	4	595	June 20, 2023
MODBUS TCP to RTU Bridge on FX30S FX30	3	1009	May 24, 2021
TCP modbus configuration Octave	9	1092	October 23, 2020
Communication between two fx30 modules via TCP/IP or Modbus IoT Modules	2	703	February 3, 2022

Modbus TCP bug with Fx30 LPWA

Related topics