Mhi-pci-semtech 0000:2c:00.0: firmware crashed

I am using the EM9190 module with the mhi-pci-semtech driver, in an OOT kernel build. The modem works fine, but sometimes crashes with the following error messages:

It recovers after a while, but the crash itself is annoying.
Has anyone seen this before, or knows how to fix this?
Thanks in advance.

Are you using the latest firmware?

Are you using the latest MBPL driver?

Did you see problem with USB interface instead of PCI interface?

did you see anything in AT!GCDUMP after crash?

We are using mhi-pci-semtech R43 (MBPL_DRIVERS_R43_ENG1-usb-pcie-src.zip).Our modem returns the following on AT!VERINFO:

        SBL: BOOT.SBL.4.1-00247
         TZ: TZ.FU.5.9-00197
        AOP: unknown
       UEFI: SWIX55C_03.10.07.00
       Mpss: SWIX55C_03.10.07.00 e32f05 jenkins 2022/12/14 16:18:06
         OS: Linux version 4.14.206-perf Wed Dec 14 17:10:29 UTC 2022
      Yocto: SWIX55C_03.10.07.00 2022 Wed Dec 14 17:10:29 UTC 2022
     RootFS: SWIX55C_03.10.07.00 2022 Wed Dec 14 17:10:29 UTC 2022
   Security: secure
RF_CAL_TREE: unknown

Running AT!GCDUMP after a crash looks like this:

Src:  Exception
Str:  EX:kernel:0x0:ipa_ctl:0x159:PC=0xc081ec20:LR=0x4d244cf1
9B4DAD14 00000000 00000000 00000000 
Prc:  MPSS
Task: 
Time: 0006CAF9
 R0: 00000000  R1: 00000000  R2: 00000000  R3: 00000000  R4: 00000000
 R5: 00000000  R6: 00000000  R7: 00000000  R8: 00000000  R9: 00000000
R10: 00000000 R11: 00000000 R12: 00000000 R13: 00000000 R14: 00000000
R15: 00000000 R16: 00000000 R17: 00000000 R18: 00000000 R19: 00000000
R20: 00000000 R21: 00000000 R22: 00000000 R23: 00000000 R24: 00000000
R25: 00000000 R26: 00000000 R27: 00000000 R28: 00000000 SP:  A107F048
FP:  00000000 LR:  4D244CF1
PC: C081EC20
CPSR: 00001E01
Mod: Unknown
Ctr: ARM, IRQ dis,FIQ dis

TOS
A107F070 8DAD176D A108E880 5210D6A0 C9DD1DC0 C9DAABD0 C9D89A68
C9D8CC70 00000001 C9D8E578 A107F080 8DAD1651 00000001 0000403F
A107F0A8 95BC8C4D C93D27D0 00000020 D898C014 0000005A C384F738
CB738EFC C3837B90 00000000 A107F150 952976AD 00000003 00675ADC
00000000 00000000 C3837B90 00000000
BOS
App ver: SWIX55C_03.09.03.00

Src:  FatalError
Str:  Internal error:
00000000 00000000 00000000 00000000 
Prc:  APSS
Task: 
Time: 00000000
 R0: 00000000  R1: 00000000  R2: 00000000  R3: 00000000  R4: 00000000
 R5: 00000000  R6: 00000000  R7: 00000000  R8: 00000000  R9: 00000000
R10: 00000000 R11: 00000000 R12: 00000000 R13: 00000000 R14: 00000000
PC: 00000000
CPSR: 00000000
Mod: Unknown
Ctr: ARM, IRQ dis,FIQ dis

TOS
00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000
BOS
scripts/power_config -> ENABLE-FTRACE START
<6>[   20.774041] coresight-tmc 6048000.tmc: TMC-ETR enabled
<6>[   20.774222] coresight-dynamic-funnel 6b08000.funnel: FUNNEL inport 7 enabled
<6>[   20.774245] coresight-dynamic-funnel 6045000.funnel: FUNNEL inport 0 enabled
<6>[   20.774267] coresight-dynamic-funnel 6041000.funnel: FUNNEL inport 7 enabled
<6>[   20.774347] coresight-stm 6002000.stm: STM tracing enabled
<12>[   20.776724] ++++ /etc/initscripts/power_config -> ENABLE-FTRACE END
<12>[   20.776822] ++++ /etc/initscripts/power_config -> ENABLE-DCC START
<3>[   20.778463] msm-dcc 10a2000.dcc_v2: DCC list passed 2
<6>[   20.778512] msm-dcc 10a2000.dcc_v2: All values written to enable
<12>[   20.778607] ++++ /etc/initscripts/power_config -> ENABLE-DCC END
<12>[   20.778695] power_config: done
<12>[   20.884650] syslog: Starting syslogd/klogd: 
<12>[   21.984086] syslog: done
<4>[   23.046838] swi_netlink_data_ready: receive user pid:895, msg_cached:0
<4>[   26.162917] ipa3_dma_enable: 5 callbacks suppressed
<3>[   26.162925] ipa ipa3_dma_enable:426 Already enabled refcnt=1
<4>[   26.164110] ipa3_dma_disable: 5 callbacks suppressed
<3>[   26.164114] ipa ipa3_dma_disable:485 Multiple enablement done. refcnt=2
<3>[   26.174651] ipa ipa3_dma_disable:497 There is pending work, can't disable.
<3>[   28.234185] [glink_pkt_ioctl]: unrecognized ioctl command 0x8004c200
<3>[   28.278618] ep_pcie_reset_init: After Reset assert pcie_core_reset
<3>[   28.279723] ep_pcie_reset_init: After Reset de-assert pcie_core_reset
<3>[   28.283819] ep_pcie_reset_init: After Reset assert pcie_phy_reset
<3>[   28.293151] ep_pcie_reset_init: After Reset de-assert pcie_phy_reset
<3>[   28.296808] ep_pcie_phy_init: PCIe V1711211: Unexpected phy version 2103 is caught
<6>[   28.305579] ep_pcie_core_enable_endpoint: PCIe V1711211: PCIe  PHY is ready
<6>[   28.330948] ep_pcie_core_enable_endpoint: PCIe V1711211: link initialized for LE PCIe endpoint
<2>[   28.330979] PCIe - link initialized for LE PCIe endpoint
<6>[   64.480947] sierra_startup_monitor
<6>[  311.651810] kworker/dying (392) used greatest stack depth: 4972 bytes left
<3>[  442.439118] Fatal error on modem!
<3>[  442.439319] modem subsystem failure reason: EX:kernel:0x0:ipa_ctl:0x159:PC=0xc081ec20:LR=0x4d244cf1.
<6>[  442.441899] subsys-restart: subsystem_restart_dev(): Restart sequence requested for modem, restart_level = SYSTEM.
<3>[  442.446767] Ramdump(ramdump_microdump_modem): No consumers. Aborting..
<6>[  442.452831] microdump_modem_notifier_nb: do_ramdump() failed
<0>[  442.560913] Kernel panic - not syncing: subsys-restart: Resetting the SoC - modem crashed.
<4>[  442.561059] CPU: 0 PID: 168 Comm: kworker/0:2 Not tainted 4.14.206 #1
<4>[  442.568150] Hardware name: Qualcomm Technologies, Inc. SDXPRAIRIE (Flattened Device Tree)
<4>[  442.574725] Workqueue: events device_restart_work_hdlr
<4>[  442.582871] [<c0110edc>] (unwind_backtrace) from [<c010ce58>] (show_stack+0x1c/0x20)
<4>[  442.587901] [<c010ce58>] (show_stack) from [<c0c0b988>] (dump_stack+0x20/0x24)
<4>[  442.595798] [<c0c0b988>] (dump_stack) from [<c0126318>] (panic+0x18c/0x3cc)
<4>[  442.602827] [<c0126318>] (panic) from [<c04c7a60>] (subsys_remove_restart_order+0x0/0x88)
<4>[  442.609698] [<c04c7a60>] (subsys_remove_restart_order) from [<c014331c>] (process_one_work+0x1a8/0x47c)
<4>[  442.618028] [<c014331c>] (process_one_work) from [<c01439d8>] (worker_thread+0x384/0x4f0)
<4>[  442.627221] [<c01439d8>] (worker_thread) from [<c0148554>] (kthread+0x158/0x160)
<4>[  442.635552] [<c0148554>] (kthread) from [<c0108654>] (ret_from_fork+0x14/0x20)
<3>[  442.652456] ipa ipa3_active_clients_panic_notifier:300 
<3>[  442.652456] ---- Active Clients Table ----
<3>[  442.652456] DMA                                      1   SPECIAL
<3>[  442.652456] FREEZE_VOTE                              1   SPECIAL
<3>[  442.652456] MHI                                      1   SPECIAL
<3>[  442.652456] TAG_PROCESS                              -2  SPECIAL
<3>[  442.652456] 
<3>[  442.652456] Total active clients count: 3
<3>[  442.652456] 

We haven’t tried using the modem in USB mode, what steps should we take to do this?

Your firmware seems to be a bit old

Did you see problem with latest firmware?

Firmware is delivered in a zip file containing a CWE and NVU files that should be used in conjunction with the ‘Firmware_Download’ application (or equivalent APIs in an application) contained in the Linux SDK SampleApps.

I can’t seem to find the Firmware_Download application, could you point me to it please?

You can see here for MBPL_SDK_R44_ENG5-fwdwl.bin.tar

Thanks! I found it, but I’m having issues putting the modem in BOOT&HOLD mode. I i run

sudo socat - /dev/wwan0at0,crnl

And then

AT!BOOTHOLD

my /dev/ folder then looks like this:

HID-SENSOR-2000e1.11.auto  core             hidraw1    i2c-3        loop-control  mhi0_SAHARA  ptp0      tty10  tty26  tty41  tty57   ttyS14  ttyS3        v4l    vcsu2
HID-SENSOR-2000e1.12.auto  cpu              hidraw2    i2c-4        loop0         mqueue       pts       tty11  tty27  tty42  tty58   ttyS15  ttyS30       vcs    vcsu3
HID-SENSOR-2000e1.13.auto  cpu_dma_latency  hidraw3    i2c-5        loop1         mtd          random    tty12  tty28  tty43  tty59   ttyS16  ttyS31       vcs1   vcsu4
HID-SENSOR-2000e1.15.auto  cuse             hidraw4    i2c-6        loop2         mtd0         rfkill    tty13  tty29  tty44  tty6    ttyS17  ttyS4        vcs2   vcsu5
HID-SENSOR-2000e1.3.auto   disk             hidraw5    i2c-7        loop3         mtd0ro       rtc       tty14  tty3   tty45  tty60   ttyS18  ttyS5        vcs3   vcsu6
HID-SENSOR-2000e1.4.auto   dma_heap         hpet       i2c-8        loop4         net          rtc0      tty15  tty30  tty46  tty61   ttyS19  ttyS6        vcs4   vfio
HID-SENSOR-2000e1.6.auto   dri              hugepages  i2c-9        loop5         ng0n1        shm       tty16  tty31  tty47  tty62   ttyS2   ttyS7        vcs5   vga_arbiter
HID-SENSOR-2000e1.7.auto   drm_dp_aux0      hwrng      iio:device0  loop6         null         snapshot  tty17  tty32  tty48  tty63   ttyS20  ttyS8        vcs6   vhci
HID-SENSOR-2000e1.8.auto   drm_dp_aux1      i2c-0      iio:device1  loop7         nvme0        snd       tty18  tty33  tty49  tty7    ttyS21  ttyS9        vcsa   vhost-net
HID-SENSOR-2000e1.9.auto   drm_dp_aux2      i2c-1      iio:device2  loop8         nvme0n1      stderr    tty19  tty34  tty5   tty8    ttyS22  ttyprintk    vcsa1  vhost-vsock
acpi_thermal_rel           ecryptfs         i2c-10     iio:device3  loop9         nvme0n1p1    stdin     tty2   tty35  tty50  tty9    ttyS23  udmabuf      vcsa2  video0
autofs                     fb0              i2c-11     iio:device4  mapper        nvme0n1p2    stdout    tty20  tty36  tty51  ttyS0   ttyS24  uhid         vcsa3  video1
block                      fd               i2c-12     initctl      mcelog        nvram        tpm0      tty21  tty37  tty52  ttyS1   ttyS25  uinput       vcsa4  video2
btrfs-control              full             i2c-13     input        media0        port         tpmrm0    tty22  tty38  tty53  ttyS10  ttyS26  urandom      vcsa5  video3
bus                        fuse             i2c-14     kmsg         media1        ppp          tty       tty23  tty39  tty54  ttyS11  ttyS27  usb          vcsa6  zero
char                       gpiochip0        i2c-15     kvm          mei0          psaux        tty0      tty24  tty4   tty55  ttyS12  ttyS28  userfaultfd  vcsu   zfs
console                    hidraw0          i2c-2      log          mem           ptmx         tty1      tty25  tty40  tty56  ttyS13  ttyS29  userio       vcsu1

I run the fw download SampleApp like this:

./fw-download-toolhostx86_64 -f ~/images -c MBIM -d /dev/mhi0_SAHARA -t 1 -w SWIX55C_03.17.04.00-001a.cwe -n SWIX55C_03.17.04.00-001a_GENERIC_030.112_000.nvu

But it returns an error:

Application version: 1.0.2412.2
GetDeviceMode: ERROR! Unknown modem state
Modem is disconnected or not in correct state. Please ensure QMI device path is correct.

If I look at dmesg, it looks like this

[  200.370246] mhi mhi0: Resuming from non M3 state (SYS ERROR)
[  200.370263] mhi-pci-semtech 0000:2c:00.0: failed to resume device: -22
[  200.370279] mhi-pci-semtech 0000:2c:00.0: device recovery started
[  200.370563] wwan wwan0: port wwan0qcdm0 disconnected
[  200.371352] wwan wwan0: port wwan0mbim0 disconnected
[  200.371641] wwan wwan0: port wwan0qmi0 disconnected
[  200.372302] wwan wwan0: port wwan0at0 disconnected
[  200.395554] mhi mhi0: Requested to power ON
[  201.176630] mhi mhi0: Power on setup success
[  201.176805] mhi mhi0: Wait for device to enter SBL or Mission mode
[  225.553524] mhi-pci-semtech 0000:2c:00.0: reset
[  228.624538] pcieport 0000:00:1c.0: broken device, retraining non-functional downstream link at 2.5GT/s
[  229.840678] pcieport 0000:00:1c.0: broken device, retraining non-functional downstream link at 2.5GT/s
[  230.928509] mhi-pci-semtech 0000:2c:00.0: not ready 1023ms after bus reset; waiting
[  232.016445] mhi-pci-semtech 0000:2c:00.0: not ready 2047ms after bus reset; waiting
[  234.128351] mhi-pci-semtech 0000:2c:00.0: not ready 4095ms after bus reset; waiting
[  238.352180] mhi-pci-semtech 0000:2c:00.0: not ready 8191ms after bus reset; waiting
[  247.055867] mhi-pci-semtech 0000:2c:00.0: not ready 16383ms after bus reset; waiting
[  263.951193] mhi-pci-semtech 0000:2c:00.0: not ready 32767ms after bus reset; waiting
[  299.792303] mhi-pci-semtech 0000:2c:00.0: not ready 65535ms after bus reset; giving up
[  299.917400] mhi-pci-semtech 0000:2c:00.0: reset failed
[  299.917407] mhi-pci-semtech 0000:2c:00.0: Recovery failed

you should not run the firmware update in boot and hold mode…..

Oh okay, thanks! What device should I then use for -d ?

the firmware upgrade application will ask module to reset once, after that the module will go to download mode and there will be one new port enumerated, you can check whether it is /dev/mhi0_SAHARA

I succeeded in updating the firmware to the latest version. Thanks!
The modem still disconnects after a while though, at this point I cannot reach it via AT commands, so I can’t run AT!GCDUMP . This is what I get from dmesg:

[ 4243.324523] mhi_wwan_ctrl mhi0_DUN: 33: Failed to receive RESET channel command completion
[ 4243.324539] mhi_wwan_ctrl mhi0_DUN: 33: Failed to reset channel, still resetting
[ 4267.389324] mhi_wwan_ctrl mhi0_DUN: 32: Failed to receive RESET channel command completion
[ 4267.389341] mhi_wwan_ctrl mhi0_DUN: 32: Failed to reset channel, still resetting

Are you ae to recover it by redownloading firmware?

Btw, which firmware are you using right now?

I only managed to recover by restarting the device. If I encounter it again I can try to redownload the firmware, but that’s not really a long-term solution for me.
I am currently on firmware version 03.17.04.00_GENERIC

You might need to test in usb mode by switching pin20 and pin22 to see if there is any similar problem

How should I do this? Is this a physical thing I need to change? The modem is inside a GETAC tablet. I can access it, but not that easily.

Yes, it is physical pin change

Did you check with getac?