Intermittent WP7700 module Failure: [Fatal error on the modem]

I am experiencing intermittent Modem Failures on my MangOH red. I have not encountered this until “app start redSensorToCloud” at which point it appears to be happening with regularity. On Friday this Fault followed by the subsequent reboot of the WP7700 occurred at perhaps 2-5minute intervals.

Fatal error on the modem.
[ 47.491506] modem subsystem failure reason: dlcch.c:1284:Assertion (pdsch_semaphore_value <= 1) failed.
[ 47.500967] M-Notify: General: 8
[ 47.610151] Kernel panic - not syncing: subsys-restart: Resetting the SoC - modem crashed.

On Friday this Fault followed by the subsequent reboot of the WP7700 occurred at perhaps 2-5minute intervals.

root@swi-mdm9x28-wp:~# cm radio
Power: ON
Current Network Operator: AT&T
Current RAT: LTE network (LE_MRC_RAT_LTE)
Status: Registered, home network (LE_MRC_REG_HOME)
Signal: Weak signal strength (2) //varies between 2 and 5
PS: Packet Switched Registered, home network (LE_MRC_REG_HOME)

root@swi-mdm9x28-wp:~# cm info
Device: WP7700
IMEI: 353805090108080
IMEISV: 2
FSN: W8810385110310
Firmware Version: SWI9X06Y_02.16.06.00 7605a6 jenkins 2018/06/20 17:56:12
Bootloader Version: SWI9X06Y_02.16.06.00 7605a6 jenkins 2018/06/20 17:56:12
MCU Version: 002.009
PRI Part Number (PN): 9908049
PRI Revision: 001.002
Carrier PRI Name: ATT
Carrier PRI Revision: 001.026_000
SKU: 1103736
Last Reset Cause: Crash
Resets Count: Expected: 47 Unexpected: 81

I have freshly built the latest most recent versions, here are some general details:

root@swi-mdm9x28-wp:~# cm sim info
Type: EXTERNAL_SLOT_1
ICCID: 89011703278211571180
Home Network Operator: AT&T
EID:
IMSI: 310170821157118
Phone Number:

Unrelated note: I am seeing streaming NACK msgs on the console port:
[ 4235.851390] i2c-msm-v2 78b8000.i2c: NACK: slave not responding, ensure its powered: msgs(n:1 cur:0 tx) bc(rx:0 tx:14) mode:FIFO slv_addr:0x3a MSTR_STS:0x0c1300c8 OPER:0x00000090
[ 4236.850666] i2c-msm-v2 78b8000.i2c: NACK: slave not responding, ensure its powered: msgs(n:1 cur:0 tx) bc(rx:0 tx:14) mode:FIFO slv_addr:0x3a MSTR_STS:0x0c1300c8 OPER:0x00000090

Is there a possible issue with being logged in on the console port as well as the CF3 Module port (ssh@root…)?
Is there an issue with USB connectivity/interaction to the Host PC that is resulting in Critical Modem Faults? (I am building/running on an Ubuntu 16.04 VM on a Win10 system, the VM has 4 CPU’s & 6GB RAM).

Is there a simple process to initiate redSensorToCloud app auto start on boot as further diagnosis/isolation and to eliminate any host-USB interactions?

Appreciate any similar/related experiences anyone has had!!

After 18 minutes of uninterrupted operation (a record) - the Modem Crapped. 2 faults+reboots then when up I restarted the redSensorToCloud app and the Modem immediately Failed two more times.
At this point I was able to get one data transfer to AirVantage and the Modem failed & rebooted again! …and despite app start or not the Modem is now back to rebooting at a high periodic rate (every few minutes). I am taking various isolation steps - but highly unusual that it would run (cold) for almost 20 minutes then start Failing/rebooting at 2 minute intervals).

A heat issue? - doubtful as the modem module is reporting 36deg C…

Has anyone encountered similar critical Modem Faults and isolated the root cause?

Thanks all,
Mike

Hi Mike,

We’re aware of this modem software stability issue, and have a solution in the upcoming Release 11 firmware. Unfortunately there’s no workaround offered without upgrading firmware. Release 11 will be available soon pending internal validation, but that particular solution has been validated in internally created scenarios that were previously resulting in the same failure.

The failure isn’t directly related to the mangOH apps, but rather to the network activity triggered by those applications. The solution will be contained in Modem firmware.

Unfortunately we can’t provide the solution until fully validated (including other Release 11 improvements), but I hope this explanation helps in the meantime.

Ryan

Ryan,

This is a help - I’ve been running in circles with fresh setups/installs and builds of every permutation.

I’ve just ordered a second platform suspecting faulty Hardware.

So saving me from further anguish makes me very happy!

Thanks,

Mike

1 Like

Can I assume a NOTICE will be posted when the stability and build issue(s) have been resolved?

I have been periodically re-making with latest legato_framework as well as mangOH - (still to no avail).

Notification with the recipe for a buildable and functional (no instability) WP7700 magOH red will be MUCH appreciated!

Thanks,
Mike

Can you tell me where to find the version of the mangOH source(s) and how to correlate to the appropriate (buildable) legato release?

ie
cd ~/mangOH git pull && git submodule update --init

Always ensures the latest mangOH source - but there’s no indication with which legato it will completely build. I get different make red_wp77xx Error/Faults with each variant of legato and with every ‘update’ of mangOH source files.

I must be missing something right in front of my face.

Can you please clarify?

Hi @mikebp,

The command you listed ( cd ~/mangOH git pull && git submodule update --init) will make sure that you have the latest mangOH code.

mangOH code currently doesn’t have version numbers or indicate which version(s) of Legato are supported. I expect to see this improve over the next year. For now, I can tell you that the mangOH team is using Legato 19.01.0 internally.

@dfrey

Yes, thanks, this is the command I use and then I try building with the latest legato release and work backward hoping I encounter the one in which the entire red_wp77xx builds to completion. With each “update --init” it will build with some/none and/or different legato versions than the previous “update --init”. Late last week - I could not get any make red_wp77xx to build to completion with faults. …and since I cannot go back to a previous mangOH (and I don’t know when mangOH changed) I’m effectively stuck in an “update --init” loop - stepping through a slew of legato releases.

Not productive.

I also see some ‘psuedo-solutions’ indicating one should simply comment out parts of mangOH (sensors) so it builds - however those are all parts that I require.

I have tried 19.01.0 and it was not able to build red_wp77xx. …at least with that days mangOH “update --init”.

…so you say over the next year. This does not help developers considering using Sierra Wireless and the mangOH platform/architecture for a prototype solution. I would require a means to address this loose structure within a week or two tops or I will be forced to look at alternative platforms.

I appreciate anything you can tell me to get me out of this loop.

Thanks!

There was an accidental breakage on the master branch of the mangOH repository because the mangOH.sdef was modified with mangOH yellow tunnel vision. I would like to setup some automated builds for mangOH to help prevent this, but it hasn’t happened yet.

Appreciate the insight - AND your efforts.
…I’ll stay tuned and rerun it tomorrow.

FYI: Late yesterday I too was able to successfully build red_wp77xx with Legato 19.01.0 and the mangOH release (update) as of 3:35pm CDT. I’m no longer going to risk any further ‘init updates’ to mangOH until R11 for WP77XX is released - and which mangOH and legato versions have been validated.

Ryan,
Can you give me any rough time-frame as to when this might be resolved & available?
…days, weeks, months or a year? Any insight would be of great help.
Thanks,
Mike

Hi @mikebp, sorry for the slow response. The release is in validation stages at this point and “weeks” would be the target. If you’ve subscribed for notifications about the product on Source, yes you should receive a notification about this release when it’s available. Sorry I don’t have a more specific date to provide yet.

Ryan

Understand.
Please post a notice on here when it has been resolved.
Thanks!

Is there any update on this issue? It seems to be still present in the latest release AT&T modem firmware: 001.026_001.

Hi @elliotmr , @mikebp
The updated AT&T firmware with this fix still needs to be certified. What is needed is for the modem firmware version to be updated to 02.20.01.00 or newer. But unfortunately, the PTCRB, AT&T and T-Mobile versions are still lagging as can be seen here https://source.sierrawireless.com/resources/airprime/software/wp77xx/wp77xx-firmware-latest-release/. Please reach out to your distributor or Sierra Wireless contact to keep track of certification progress. Thanks.