Stability of FOTA firmware upgrade

We are investigating upgrading the firmware on various SierraWireless modules, the HL8548, HL7800 and the HL7688. We would like to use FOTA to do so.

The AT command reference of the HL8548 module mentions the following (see pp 644):

Note: Avoid powering the module down during an AVMS FOTA update (or during a local update using +WDSD), especially between +WDSI:14 and the module’s reboot.

For the HL6528x, the maximum time for a local download (between +WDSD and +WDSI:3) is 3 minutes; and the maxiumum flashing time (upgrade duration between +WDSI:14 and +WDSI:16) is 8 minutes.

I could not find the similar for the HL7800 and the HL7688. Do these modules use a more robust firmware upgrade method that can be interrupted? We do not have control of the power of our devices and are at the whim of the end-user to not power down the device during an upgrade. I’d like to understand when this will be an issue.

Hi @simon

The FOTA firmware upgrade is resumed after a power down/on occurred during the update on these modules.

Please help to tick “Solutions” if your question is answered.

Thank you @Donald for your answer. Does this mean it’s safe to assume that when the modem is power cycled during an upgrade it will resume okay?

Hi @simon

Yes, it does.

Help to tick “Solutions” if your question is answered.

@simon

So what I would say is that there are different stages to the FOTA process. The download process is fairly simple, its literally just downloading a file into memory, so yes there is a resume mechanism to start where it left off if the unit is powered down during the download.

Once the file is downloaded, the CRC’s checked and the digital signature verified then the upgrade process can begin (either automatically by default or, if you have set it, by explicit acknowledgement from the application using at+wdsr=3), this is the point of highest risk. With regards this (which as referred to above is between the WDSI: 14 and 16 responses) we have built as many safe guards into the firmware as possible but at the end of the day you are using delta files to manipulate the contends of the flash and there is always a chance that it will possibly fail, the 8 minute number is extreme, generally speaking the window will be a couple of minutes.

Unless you have multiple banks of flash to hold complete images (which the size and cost of the units do not allow for, not to mention the cost of the data) every unit/system is going to have potential issues like this when upgrading.

Regards

Matt

Thanks again for your detailed answer @mlw.

Unless you have multiple banks of flash to hold complete images (which the size and cost of the units do not allow for, not to mention the cost of the data) every unit/system is going to have potential issues like this when upgrading.

So it sounds like there is only one partition/image to write the firmware to? No secondary or factory partitions. Is that correct?

In case of a failure during the upgrade, is there a way to recover module? Is there documentation on doing this from Linux? I can’t find any documentation on this? Our units are now in the wild and it appears we need to upgrade some units to the latest versions.

@simon

Correct, there aren’t multiple (or even dual) bands of flash or partitions across a flash device.

So I would say that the mechanism is pretty robust, we rarely see failures.

On recovering units where this might fail, that’s quite a detailed technical discussion and a lot of work on your side I would say. The two units you mention, HL7800 and HL7688, are based on Altair and Intel respectively so have different download/recovery mechanisms. Normally if we manage to brick units then we have access to dev kits and hence multiple interfaces, the Intel would need USB where the Altair generally prefers UART. Then you get into the tools which is moire difficult (and we don’t really have external documentation on that).

TBH if I were you I would bench test the units FOTA mechanism and test it until you are happy that it will survive what ever scenario you application throws at it and then solely use that.

Regards

Matt