MC7455: Reducing LTE power consumption

We have about a dozen instances of an M2M application deployed in testing for some weeks now. All have the MC7455 for the modem.

Most instances work reliably and in general we are almost satisfied. A single instance though has severe problems. The phenomenon is that the whole USB bus collapses affecting all devices on it, and this happens multiple times almost every day. This often happens in rapid successions so that the device would not be able to function properly for hours, after which it works again for some period, if we’re lucky for a day. We could not observe any patterns or regularity regarding when this occurs - it seems random.

Our best assumption is that the problem is related to power spikes drawn by the modem. This is the only instance where the modem is connected over LTE, all other deployments (which work perfectly without any problems) have UMTS connection, and the PTS tells us that consumption really can be higher when using LTE. All other aspects of the systems are completely identical, except for the network providers in use. In the problematic instance, the provider is AT&T in the Santa Monica region, and the modem is running AT&T-certified firmware (02.24.05.06_00_ATT_002.027_000).

Does our assumption about a power delivery problem sound plausible? Is there a chance the provider network also has a play in this (even though signal quality and strength in general seem to be good)?

And the real question:
What is the recommended way to reduce power consumption when using LTE? We’ve found !SARBACKOFF in the AT command reference, but there is no explanation of the backoff concept. What is the backoff state, and how is it set during runtime / what determines its value? Is it something we need to adjust for all 8 levels and what values (magnitude) should we choose? The example values in the AT-reference are all positive, but as they should be negative, am I right this is actually a negative offset?

Thank you for your help,
Karoly

Karloy,

It is possible it is power related but the description/symptom is quite general so difficult to say.

Does the application a lot of data to send i.e. it is queued in the system and as a result the LTE connected device is sending faster. You cannot use the SAR backoff for this sort of thing, in fact if the unit is on LTE it has to control the power independently as it has certain power obligations to meet. The way to eliminate whether LTE is a factor or not is to turn it off with at!selrat and just have the unit on 3G.

From the description it is unlikely the networkhas anything to do with it, one comment is quite telling though ad that is that you have good signal strength, the implication of this is that the transmit path is fairly clear as well meaning that it should not be transmitting at full power which is where the huge power cinsumptions come in in addition to high data rates.

Would be good for you to send a few commands to the unit to get some context/background information.

ati
at!gstatus?
at+cgdcont?
at+cimi
at!priid 

Regards

Matt

Hi Matt,

Thank you for your help. I logged in to the device after one of its reboots (to recover from the above fault), and here is the output of the above commands:

>ati
Manufacturer: Sierra Wireless, Incorporated
Model: MC7455
Revision: SWI9X30C_02.24.05.06 r7040 CARMD-EV-FRMWR2 2017/05/19 06:23:09
MEID: 35907206141801
IMEI: 359072061418014
IMEI SV: 12
FSN: LQ738185880410
+GCAP: +CGSM


OK

>at!gstatus?
!GSTATUS:
Current Time:  658              Temperature: 33
Reset Counter: 1                Mode:        ONLINE
System mode:   LTE              PS state:    Attached
LTE band:      B30              LTE bw:      10 MHz
LTE Rx chan:   9820             LTE Tx chan: 27710
LTE CA state:  INACTIVE                 LTE Scell band:B5
LTE Scell bw:5 MHz              LTE Scell chan:2430
EMM state:     Registered       Normal Service
RRC state:     RRC Connected
IMS reg state: No Srv

PCC RxM RSSI:  -86              RSRP (dBm):  -119
PCC RxD RSSI:  -86              RSRP (dBm):  -119
SCC RxM RSSI:  -81              RSRP (dBm):  -103
SCC RxD RSSI:  -85              RSRP (dBm):  -111
Tx Power:      0                TAC:         874B (34635)
RSRQ (dB):     -17.6            Cell ID:     086D4197 (141377943)
SINR (dB):     -1.8


OK

>at+cgdcont?
+CGDCONT: 1,"IPV4V6","i2Gold","0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0",0,0,0,0

OK

>at+cimi
310410903483373

OK

>at!priid
ERROR

Regarding your question about the amount of data.

No we don’t have a lot of data. We transmit solely sensor values and diagnostic data, sometimes on demand (when something happens), and otherwise regularly (once every 10 minutes). All data is transmitted as HTML requests. I think what best describes our communication pattern is “occasional bursts of a few small packets”.

Did a little search in the AT reference manual, and saw that priid needs the query question mark too. That worked without error, here is the output:

>at!priid?
PRI Part Number: 9906491
Revision: 001.002
Customer: Generic-M2M

Carrier PRI: 9999999_9904594_SWI9X30C_02.24.05.06_00_ATT_002.027_000

OK

Karoly,

Given that you are not sending much data then no I do not think that it is necessarily power related. Why are you using the APN ‘i2Gold’? I presume this is a private APN for your company?

There is nothing obvious that would make the unit reset, is your host system Linux? If so what is the syslog saying when it dies? Also are you using the Sierra Wireless drivers?

Regards

Matt

Why are you using the APN ‘i2Gold’? I presume this is a private APN for your company?

That is a private APN of our client. It is not in the control of my company. We must use this APN because the sim cards are supplied by our client and are part of a sim fleet.

There is nothing obvious that would make the unit reset

The unit is not resetting, at least not alone. What we see is that all devices on the same bus crash, including the wifi adapter and usb storage too. The modem in itself is the least of ours concerns because it always recovers within 20-30 seconds, but for the storage it is fatal as it becomes read-only. Due to the embedded nature of the product, only a system restart can be used to safely recover from the situation. The firmware has since been remote-upgraded to recognize and recover automatically in such cases, but it still causes a very significant downtime in total, multiple hours every day.

We suspect though the modem is the root cause, but being a physically remote deployment, it is hard to analyze as far as the hardware is concerned. On any other machine, we could not reproduce the issue, all other factors being equal (except for the connection technology and provider).

is your host system Linux?

Yes, a minimal system built using buildroot, so it is not a standard distribution like Ubuntu or Debian.

If so what is the syslog saying when it dies?

Lot’s of IO errors and timeouts from all USB devices on the same bus. These are however already the result, not the cause of the fault.

Also are you using the Sierra Wireless drivers?

Do you mean your official SDK? We have the recent mainline kernel drivers and ModemManager plus NetworkManager for connection management. This solution is easy to set up and modify, with tons of documentation, and works perfectly and stable on all other installations.

The way to eliminate whether LTE is a factor or not is to turn it off with at!selrat and just have the unit on 3G.

Can you please elaborate the exact use (syntax & parameters) of this command? We would like to try this but it is completely missing from the AT Command Reference for the MC74xx.

Let me add a small explanation of why we think the root cause is the power delivery / consumption of the modem.

Since all devices on the same bus are affected, it is already very probable that the problem’s nature is in the hardware. A faulty modem driver for example wouldn’t be able to cause IO errors in the usb storage devices. This is also the reason we can rule out the linux drivers as a cause. Because all devices fail and consistently at the same time, we can also safely assume that what we are seeing has a single cause, and we are not dealing with multiple independent issues in each device. The USB datalines are point-to-point, so the only common points of failure in the hardware are the usb-hub and the power lines. Since modems are known to consume high current peaks, and also in our system the modem is obviously the highest power consumer, we are suspecting the current spikes drawn by the modem to be the problem. Furthermore, the only difference between the failing system and the working ones is the access technology (lte vs umts) which is also mobile related.

If you send at!selrat=? to the unit it will give you the options/format of the command (you can do this with most of the commands). To set it to 3G only you just need to send the below.

at!selrat=1

Then reset the unit. Note you might need to send at!entercnd=“A710” before setting selrat.

Regards

Matt

Hello,

Thank you. We will test this now. First we’ll create a new firmware that switches to 3G and then switches back automatically if no 3G reception is available. We will install a new test station locally and test this firmware for a few days there, and if everyything’s fine, we’ll roll this out to the failing instance in the field. Then again collect some data for some days.

I assume all-in-all this might take about 2 weeks, then I’ll get back to you with the result, most importantly if this helped solve our problem or not. But even if it does, especially if it does, there is one more thing we’d like to try.

So we’ll get back to you, in the meantime please don’t close this thread.

Hi, sorry it took much longer than anticipated. at!selrat per se is working as expected and is doing its job when tested in the lab, but on the remote unit where we actually want to deploy the solution, there seems to be a problem relating to selrat. I’ve opened another thread for the selrat problem as it is fundamentally a different issue ftom the one in this thread. Once we can get selrat working on the affected unit, I’ll post here if our issue in this thread could be solved using it.

Your problem may not be power related.
It is true that LTE could draw more power but you are far from the worst case scenario.

Your RSRP is very bad:

PCC RxM RSSI: -86 RSRP (dBm): -119
PCC RxD RSSI: -86 RSRP (dBm): -119
SCC RxM RSSI: -81 RSRP (dBm): -103
SCC RxD RSSI: -85 RSRP (dBm): -111

It could be that the modem is switching back and forth between 3G and LTE.
Did the modem reset itself (crashed)?
We rarely see crashes in the field but when the conditions are marginal things can happen. If you collect a crashdump we’d like to know about it.

at!gcdump
No crash data available

OK

Note: modem powercycle clears the crashdump data.
In your lab, does the system recover when you issue “at!reset” ?
You could write a script and randomly reset the modem under various conditions. Maybe you can replicate the USB collapse.

“The unit is not resetting, at least not alone. What we see is that all devices on the same bus crash, including the wifi adapter and usb storage too.”
The modem should be the only device connected to USB.

“Lot’s of IO errors and timeouts from all USB devices on the same bus.”
Maybe your USB HC dies. There is a way to reset the HC in the kernel. Don’t ask me how :blush:

It is possible that you can workaround the issue by locking the modem to 3G which has a better signal.
Better signal -> less USB activity.

Thanks,
James

I am back with positive results. Our workaround using at!band has been in the field for almost 3 weeks now, and our problem seems to be solved. What we did is we installed a script that automatically switches the modem to all bands for 1 hour every day (6am-7am) , which makes the modem select 4g, but otherwise runs the modem on 3g bands 23 hours a day (7am to next day 6am). We already had another script from earlier debugging attempts that records when the USB bus collapses before resetting everything as a recovery measure. Here’s a database excerpt which shows all timestamps a bus collapse was detected.

The results speak for themselves. Before deploying the workaround (Sept. 27), we could observe crashes all over the place, multiple times every day. With the workaround, the only times the we have crashes is between 6-7am, exactly the times we are running in 4G mode. More importantly, there hasn’t been any crashes in the last 3 weeks, not a single one (except on 4G).

So the workaround to limit the modem to 3G is working. In consequence we also conclude that our problem is really, as we suspected, that the modem is consuming high peaks of current in 4G mode.

I’m glad you guys could workaround the issue with !band.
As I wrote earlier your RSRP was really bad, which possibly forces the modem to transmit at higher powers.
It could be that 3G in that area has a better signal thus consuming less power.

The bad RSRP is indeed most probably the cause. Or, to be more precise, that the power delivery in the electronics is not adequate for that case. In our defence though, we couldn’t reproduce the problem otherwise (in the field or artificially in lab) on other units (probably due to the better RSRPs). One place we were mislead is ModemManager’s signal level, which tells us 60-70% on this problematic unit, so we were lead to believe signal is good. From now on we will keep an eye on RSRP on installations.