Diagnosing Intermittent Communication Issues

We use a Sierra AirLink MX400 as a cell modem that accepts requests from a Telus VPN and issues commands from these requests via serial to a controller for a DMS sign.

What should I be looking for in the filtered logs that indicates connection issues?
filteredlogs2AfterChange.txt (880 KB)

Hi,

log shows a lot of restart and network link down.
It also shows a very good signal quality.
However it does not reveal particular clue on what is exactly happening.
It could be interesting to check if those restarts are issued by a controler or manually (eg for tests) or if those are reboots. Also it shall be sorted out whether the VPN connection loss is due to a network connection loss or only a VPN connection loss, and verify that you are running our latest FW (see on the download section of our website).
Please contact your distributor for investigating further,

Regards

The Firmware version is prob. old. I’d rather not re-flash it and loose its settings while it is operational. I think AceManager was version 4.3.0.005. I’ll need permission to schedule some downtime for the modem if I need to upgrade it

Can someone clarify if the modem is restarting/re-establishing connection or is it just loosing connection to the VPN?

Which line or lines in the log file please?

upon FW upgrade you will not loose your existing configuration.
FW will be installed on top of your existing config without reseting it.

The logs do not show whether issue on VPN side or on Radio Network side.
This will need more specific investigation and this is why you shall turn to your distributor technical team for best support,

Thanks for returning and commenting nmp.

I wasn’t aware settings were maintained after a firmware upgrade. I’ll have to research this further if/when the time comes.

A Firmware upgrade is something we can offer the client, in our upgrade/maintenance proposal. Currently this is a field-level diagnosis to identify COM issues.

I believe the cell modem is not restarting, but the VPN link is being dropped. In the Log:

Oct 31 12:54:29 info ALEOS_WAN_RMON: SWI_NOTIFY_CallStatus 0
Oct 31 12:54:29 info ALEOS_WAN_CMAN: in HandleDisConnect.
Oct 31 12:54:31 info ALEOS_WAN_linkmon: Linkradio Link Status 0

There are repeated instances that the link to the carrier VPN1 is lost, the modem sets the CallStatus 0 and attempts reconnect.

Anyone disagree or agree? Or have some insight that it may be something else?

What if I used the Keepalive function under the WAN/Cellular tab in AceManager?

This might keep the connection established to the VPN? I would use the VPN server IP?

Just an update:

I’ve contacted TELUS corporate Tech Support. They were able to see the cell modem connected to the VPN (for now). I explained that the VPN link was being dropped and that our cell modem has been reestablishing its connection continuously. They mentioned that there is a possibility that the management software for their VPN may be prioritizing its list. A lack of traffic for our VPN tunnel may have had its priority moved to the bottom, as a result closing the VPN tunnel. (A possibility?)

So they refreshed the switch on their end to allow itself to recycle the profile for the cell modem and its programming so that it may be moved to the top of the list. They couldn’t confirm that the cell modem had reconnected (Its still hasn’t reconnected). I’m hoping it will within the next two hours it will or I may have to schedule someone to be on site to reboot the cell modem as he suggested.

I asked if would make sense to enable the keepalive. He said, and I quote: “That might be a good idea, I never thought of that”.

I asked what address I should use to ping and he suggested a public one: 8.8.8.8 said it was associated with google?

Anyone care to comment?

Problem with keepalives (no-data ones) and GPRS is that some operators inspect the traffic sees it contains no actual data and queues it up instead of forwarding properly, which can cause a disconnect even when using keepalives.

Thanks for the info tobias I’ll keep that in mind. Currently the idea of using keep alive is on the back burner.

I have an updated log for the Dec 12-13. Showing whats going on after the TELUS ‘refresh’.

Any ideas?

**quick edit - We’ve noticed the timestamps are off by about 7 hours. The log was taken on the 12th at about 9pm yet it has time stamps from the early morning the next day…
Dec12Log.txt (75.5 KB)

Update: We’ve acquired new information from our client and TELUS.

The HSPA IP range allowed for use is 10.148.44.160 – 10.148.44.191
Looking at the IP’s Assigned to the cell modems at all the sites, NONE of them are within this range. We’ve sent logs to TELUS from the cell modems and client software.

Could this be a simple fix by ensuring the cell devices are assigned IP’s within this range? Stay-tuned!