MC7304 suddently stops working

Hello,

I am using the MC7304 (with the latest firmware) in a Linux box (PcEngine APU) running kernel 3.18.9. I use the qmi_wwan driver in the kernel and my own user-space tool for configuring and connecting the modem.

Initially everything works fine, but after some time (several hours) the modem becomes unresponsive. I suddenly get no reply to any QMI messages and the serial port is dead. Doing a soft reboot (i.e., running “reboot”) makes the QMI device somewhat responsive again, while the serial devices remain dead. When I run I my tool to configure and connect the modem, I notice two things:

  • CTL get version info returns error 0x2. According to the manual, this is QMI_ERROR_NO_MEMORY. This again is caused by “Device could not allocated memory to formulate a response”.
  • If I ignore this error and continue configuration, modem never replies to NAS get system info. I am also never attached to the network (radio is never registered).

Has anyone experienced anything similar or has any tips on how to proceed? Is it possible to create QMI messages that makes the device leak memory? Is there a command to free all memory?

Searching for the error message yields few result and cutting the power completely is not really an option, as this box will be placed in a remote location and the only power “control” we have is the reboot command.

Thanks in advance for any help!

Hi,

Can you try with the latest drivers for MC7304?
What exactly are you doing? i mean some specific application you have made? is application doing something which might be causing this… if you can share we can try it at our end…
You are never registered to network? did you check the signal strength that time?

Also, i guess if the issue is not 100% reproducible, you can contact your distributor/FAE and he will help you open a ticket with Sierra wireless…But, does the issue appears on older FW?

Thanks
Rex

Hi,

Yes, I have the latest version of the Linux driver. My application is not doing anything special and works fine with several other QMI modems, but they are all USB-sticks and we would like to use an internal modem. The application initial does the same as your Linux-driver. It sends a SYNC to the device, requests version info and CID to services we are going to use. I then configure the difference services (set up notifications, disable autoconnect, …) and then wait until I am registered in the network. When this happens, a START_NETWORK interface message is sent to connect. If we lost connection, another START_NETWORK message is sent. So I don’t think I am doing anything special.

While this happens, the modem still replies to some messages. So the NO_MEMORY error does not really makes sense. The message that never gets a reply is, as I mentioned, NAS request system info (0x4d). However, I still get signal indications from the modem:

(15:51:05.062) [INFO]: qmi_lib/QmiUtility.cpp parseQmi 88 Complete message
(15:51:05.062) [INFO]: qmi_lib/QmiUtility.cpp parseQmi 95 1 20 0 80 3 1 4 0 0 51 0 14 0 11 8 0 83 5 0 8 96 ff ff ff 14 6 0 ad f7 94 ff 50 0 
(15:51:05.062) [INFO]: qmi_lib/QmiUtility.cpp parseQmi 97 

QMUX:
(15:51:05.062) [INFO]: qmi_lib/QmiUtility.cpp parseQmi 98 	length: 20
(15:51:05.063) [INFO]: qmi_lib/QmiUtility.cpp parseQmi 99 	flags: 80
(15:51:05.063) [INFO]: qmi_lib/QmiUtility.cpp parseQmi 100 	service: 3
(15:51:05.063) [INFO]: qmi_lib/QmiUtility.cpp parseQmi 101 	client id: 1
(15:51:05.063) [INFO]: qmi_lib/QmiUtility.cpp parseQmi 106 

QMI (service):
(15:51:05.063) [INFO]: qmi_lib/QmiUtility.cpp parseQmi 107 	flags: 4
(15:51:05.063) [INFO]: qmi_lib/QmiUtility.cpp parseQmi 108 	transaction id: 0
(15:51:05.063) [INFO]: qmi_lib/QmiUtility.cpp parseQmi 109 	message type: 51
(15:51:05.063) [INFO]: qmi_lib/QmiUtility.cpp parseQmi 110 	length: 14
(15:51:05.063) [INFO]: qmi_lib/QmiUtility.cpp parseQmi 130 

TLV:
(15:51:05.063) [INFO]: qmi_lib/QmiUtility.cpp parseQmi 131 	type: 0x11
(15:51:05.063) [INFO]: qmi_lib/QmiUtility.cpp parseQmi 132 	len: 8
(15:51:05.063) [INFO]: qmi_lib/QmiUtility.cpp parseQmi 141 	value: 83 5 0 8 96 ff ff ff
(15:51:05.063) [INFO]: qmi_lib/QmiUtility.cpp parseQmi 130 

TLV:
(15:51:05.063) [INFO]: qmi_lib/QmiUtility.cpp parseQmi 131 	type: 0x14
(15:51:05.063) [INFO]: qmi_lib/QmiUtility.cpp parseQmi 132 	len: 6
(15:51:05.063) [INFO]: qmi_lib/QmiUtility.cpp parseQmi 141 	value: ad f7 94 ff 50 0

I see that my comment about not being registered in the network was incorrect. We do in fact have LTE. However, modem is unable to connect (if I chose to ignore the system info request).

We have just started doing some proper tests, but I have seen this error happen three days in a row now. I will update i I find an easy, reproducible way to trigger this bug. The coverage where I am testing is good, so that should not have an effect. Also, as I said, I have to reboot device to even get this far. If I don’t reboot, I do not get any replies when this error happens.

After going through the documentation one more time, there is one thing I am wondering about. We also use the PDS service with the “deafault” configuration. I.e., we configure the event to provide the information we want and then call auto-start. When looking through the PDS documentation, I see mention of storing assistance data in persistent memory. Could this be a cause?

Thanks for the help so far.

-Kristian

I did some more testing and figured out how to recover the modem. I have not figured out what triggers the error, but I can reliably trigger it by just letting the modem be connected for some time.

After reboot, if I use the set operating mode command to restart modem, it restarts and everything works fine afterwards. Does anyone know of any way to fake a restart? Restarting the node multiple times per day is not really ideal.

Another update. Manually deauthorizing and authorizing the modem makes it reply to QMI messages (same as after the reboot) and I can then reboot it using the set operation mode. I.e., I can now recover the modem without rebooting the machine.

It is still quite annoying that the modem fails so frequently (multiple times per day), as except for that it is the best one we have tested so far (in terms of meta data, overall stability, …). Does anyone have any idea on what the underlying error might be and if it is possible to fix it? I had a theory about it being some kind of power-save, but for the last couple of tests I continuously send ICMP packets over the interface.

is the modem freezes? do you see all the ports enumerated that time?

Sorry for my late reply.

All devices are enumerated. I have been testing a bit more and found a bug in our application, causing the requests for packet service to be sent with a transaction ID of 0. After I fixed this, I see that what I initially thought was the modem freezing, was just this bug. Now I get a reply and modem reports that it is attached to network and has packet service.

However, I still loose the IP frequently. I am right now looking into L2 state + how the DHCP client behaves on my device. Will post an update once I know more.