We are testing modbus service with two TCP servers and it seems after few hours of running time, the ethernet interface freezes and modbus service cannot connect to it anymore. The only solution for it is to reboot the unit which is not acceptable at all.
Sep 21 22:27:26 fx30 user.err Legato: =ERR= | modbus[1607]/components T=main | mbrModbus.c mbr_ConnStateAction_CONNECTING() 293 | Failed to connect to eth0 (Operation now in progress) Sep 21 22:27:28 fx30 user.err Legato: =ERR= | modbus[1607]/components T=main | mbrModbus.c mbr_ConnStateAction_CONNECTING() 293 | Failed to connect to eth0 (Operation now in progress) Sep 21 22:27:31 fx30 user.err Legato: =ERR= | modbus[1607]/components T=main | mbrModbus.c mbr_ConnStateAction_CONNECTING() 293 | Failed to connect to eth0 (Operation now in progress) Sep 21 22:27:34 fx30 user.err Legato: =ERR= | modbus[1607]/components T=main | mbrModbus.c mbr_ConnStateAction_CONNECTING() 293 | Failed to connect to eth0 (Operation now in progress) Sep 21 22:27:37 fx30 user.err Legato: =ERR= | modbus[1607]/components T=main | mbrModbus.c mbr_ConnStateAction_CONNECTING() 293 | Failed to connect to eth0 (Operation now in progress) Sep 21 22:27:39 fx30 user.err Legato: =ERR= | modbus[1607]/components T=main | mbrModbus.c mbr_ConnStateAction_CONNECTING() 293 | Failed to connect to eth0 (Operation now in progress) Sep 21 22:27:42 fx30 user.err Legato: =ERR= | modbus[1607]/components T=main | mbrModbus.c mbr_ConnStateAction_CONNECTING() 293 | Failed to connect to eth0 (Operation now in progress) Sep 21 22:27:45 fx30 user.err Legato: =ERR= | modbus[1607]/components T=main | mbrModbus.c mbr_ConnStateAction_CONNECTING() 293 | Failed to connect to eth0 (Operation now in progress) Sep 21 22:27:48 fx30 user.err Legato: =ERR= | modbus[1607]/components T=main | mbrModbus.c mbr_ConnStateAction_CONNECTING() 293 | Failed to connect to eth0 (Operation now in progress) Sep 21 22:27:50 fx30 user.err Legato: =ERR= | modbus[1607]/components T=main | mbrModbus.c mbr_ConnStateAction_CONNECTING() 293 | Failed to connect to eth0 (Operation now in progress) Sep 21 22:27:53 fx30 user.err Legato: =ERR= | modbus[1607]/components T=main | mbrModbus.c mbr_ConnStateAction_CONNECTING() 293 | Failed to connect to eth0 (Operation now in progress)
Well we left unit over night and after few hours it happened based on kernel logs seems it is due to a bug on USB ethernet chip driver and even though it gets recovered service does not know this and does not re-initilizes:
[ 3744.010938] smsc95xx 1-1.1:1.0 eth0: Failed to read reg index 0x00000114: -71
[ 3744.010962] smsc95xx 1-1.1:1.0 eth0: Error reading MII_ACCESS
[ 3744.010980] smsc95xx 1-1.1:1.0 eth0: MII is busy in smsc95xx_mdio_read
[ 3744.010998] smsc95xx 1-1.1:1.0 eth0: Failed to read MII_BMSR
[ 3744.708127] smsc95xx 1-1.1:1.0 eth0: unregister âsmsc95xxâ usb-7c00000.hsic_host-1.1, smsc95xx USB 2.0 Ethernet
[ 3744.708334] smsc95xx 1-1.1:1.0 eth0: usbnet_stop
[ 3744.741494] [RMNET:HI] rmnet_config_notify_cb(): Kernel is trying to unregister eth0
[ 3744.760418] [RMNET:HI] rmnet_config_notify_cb(): Kernel is trying to unregister eth0
[ 3745.200348] smsc95xx 1-1.1:1.0 eth0: register âsmsc95xxâ at usb-7c00000.hsic_host-1.1, smsc95xx USB 2.0 Ethernet, 22:2c:a6:be:49:ac
[ 3745.364265] smsc95xx 1-1.1:1.0 eth0: usbnet_open
[ 3745.364486] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
[ 3746.878999] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[ 3746.901524] smsc95xx 1-1.1:1.0 eth0: link up, 100Mbps, full-duplex, lpa 0xC1E1
So I manually gave eth0 the same ip which I set in modbus service after recovering [time stamp: 3745.200348] and service continued to work:
Ifconfig eth0 192.168.10.10
So your service needs to monitor the eth0 status and if down, wait for it to gets back up and re-assign the previous ip to it.
I tried to reproduce with both USB connected and disconnected, and could not reproduce the issue : ethernet is stable and modbus messages are still received after 3hours.
Could you give us any information which could help ?
power supply
physical connections
steps (like plugging before power up, make some ssh commands, then unplugâŚ)
We tested with both lab bench PSU and industrial DIN rail mountable switching power supply so the issue is not related.
We have two modbus servers having 8 register banks each which we poll every 30 seconds. these two are connected to FX30 via an unmanaged switch (also we tested it with only one server directly without switch inbetween â same results).
client settings: ip: 192.168.10.10, netmask: /23
servers settings: ip: 192.168.10.11 and 192.168.10.12, server ID: 1
There is no USB cable attached during our tests.
BTW this issue happens quite randomly once after 20 min and another time after 4-5 hours.
Itâs interesting to have an identical setup, so what do you use a slave simulator?
The ethernet controller smsc95xx is a USB device, and I wonder if it could be related to some USB low power mode switching as I could see on other traces you sent by email.
Not sure it is possible through a shell command⌠but if you have the opportunity to run a long test (overnight ?) with USB cable plugged, it will force to keep high-power mode on USB controller. It could help to isolate the issue.
After a new test of more than 16h without being able to reproduce, we internally suspect an hardware issue (bad connections of the PHY ? thermal issue ?). Please raise RMA with your supplier asking for a swap.