SL6087 reset randomly

Hi

I am working on Sierra Wireless AirPrime SL6087 module and using firmware 7.46.0, OS-6.36.0. I have written an AT application, which creates 4 task,

GPRS Task (Priority - 4) – Creates Message Queue to Transfer data between GPRS_Task and other Tasks, It does the GPRS connection, opens one TCP-Client Socket to connect to Remote Sever, that remains open for entire session, it is used to receive data from Remote Server. It also, creates one more TCP-Client Socket [when the Uart_Task has data to be transferred to Sever] to transfer data to remote sever and when all data is transferred, this connection is closed. The Uart_Task, invokes the GPRS_Task using Message Queue, when it has Data to send. The GPRS_Task also creates a periodic timer of 1min, that checks the if any data received from Remote Sever for processing, and raises query for Signal Strength using ‘at+csq’.

Uart_Task (Priority - 3), Uart_task, opens the comm-port 2 in Data mode to communicate to external world, and when it has data from external world, sends a Message to GPRS_Task to send it to remote sever, otherwise, it is blocked on Semaphore and released only when Uart Receive Event handler has received data to Process. i also releases semaphore for SMS_Task, if data requires SMS to be send.

NMS_Task (Priority - 2): NMS_Task is blocked on Semaphore, which is released when the 2nd TCP-Client Socket is created by GPRS_TASK on Uart_Task request, and WIP_WRITE event is received by the socket handler. Once released, it reads the buffer address from FIFO, and sends the data to remote sever using wip_write, and request to GPRS_Task, to close the Channel using Message Queue. And, again get blocked on semaphore.

SMS_Task (Priority - 1): SMS_Task is blocked on semaphore, which released when Uart Task has data to be send as SMS, other wise it is blocked on semaphore.

The address of buffer, which holds the data received by Uart_task, is communicated between task using adl_queue (QUEUE_ORDER). Three FIFOs have been created. First, between Uart_Receive_Event_handler & and Uart_Task, to hold address of buffer received by Uart_Receive_Event_handler and to be processed by Uart_Task. Second, between Uart_Task and NMS_Task, to hold address of buffer holding processed data to be send to remote server. And third, between, SMS_Task and Uart_task, to hold address of buffer holding processed Data to be send as SMS.

The AT application works as expected, randomly but it is getting the system getting reset.
To check the cause of reset, I used adl_InitGetType, it returns ‘0’ .i.e. ‘ADL_INIT_POWER_ON’.
I also subscribed, to ‘adl_errSubscribe’ but the error handler is never getting invoked. I am not getting any reason behind the reset.

Is there any way to detect the memory corruption? And, if the reset is happening because of memory corruption, what will be the return value of ‘adl_InitGetType’?

What may be the other possible reason of reboot?

Thanks in advance.

Jitendra

Hi,

In the same query, if the wip channel handle for created tcp-client connection is stored in to ‘client_socket’ (wip_channel_t type) variable. And, in Socket Event Handler, void cbSocketevh(wip_event_t *ev, void *ctx), the wip channel is being referred using ‘client_socket’ (used to store socket handle) instead of ev->channel. will this may result in to socket handle corruption and reset of application?

Also, is it necessary for event handler to be re-entrant, because, in socket event handler, few global variables and semaphore have been accessed. Is it safe to access the global variables and release the semaphore (only if acquired) in event handler?

Please, clear my doubts.

Thanks in advance.

Jitendra

Storing the wip channel is safe. The event handler need not be re-entrant, because event handlers are actually driven by message passing to a hidden outer loop around each thread.

There are a couple of things I found from experience that cause problems:

  • the watchdog timer triggers after 5 seconds if the module is not idle. Having threads blocked on semaphores may not count as “idle”.

  • I tried the “block thread on semaphore” design pattern and couldn’t get it to work. This may well be the source of your crashes. Semaphores only seem to work if used for critical section exclusion, not blocking threads for a long time.

  • have you looked for backtraces in Developer Studio? (You may need to get the app to clear backtraces on startup)

  • The WIP library is only safe from the thread in which it’s initialised.

It might be safer to simply restructure the whole application into one thread that does everything in event handlers. I think this is the way you’re “supposed” to write applications in OpenAT. In particular, all API functions are nonblocking. So if you write short functions they will all execute in a short time.

Thanks, the explanation given is very useful. I have some quick questions with respect to back-traces and watchdog.

If the reboot is due to watchdog (I assume it to be hardware watchdog), then, what will be the return value of [i]adl_InitGetType/i. Will it be ADL_INIT_POWER_ON or ADL_INIT_REBOOT_FROM_EXCEPTION?

As you have mentioned about back-traces, I registered the Error Event Handler via [i]adl_errSubscribe/i and returning the TRUE from Error Event Handler. But, the Error Event handler is never got invoked when reset occurs (I have used TRACE to print ErrorID). As soon as the module resets, it goes to production mode, and traces only get printed after it has been again switched to development mode. Do I need to use Back-trace Analysis function to get back-traces after reset?

How, to get backtraces in Developer Studio (2.0)? I have read the doc How to Configure Developer Studio and GDB Stub. As it says, in Debug configuration use ‘Open AT GDB Target’ to configure GDB, but, there is no option like ‘Open AT GDB Target’ as mentioned in the doc. How should I configure GDB in Developer Studio.

Thanks
Jitendra

Cant answer the other questions but for the watchdog it definitely returns ADL_INIT_REBOOT_FROM_EXCEPTION
and the backtrace will look something like this:

Watch dog reset. Tsk 31
	Unknown function (0)
	Unknown function (0)

But, the return value of adl_InitGetType() reported is ADL_INIT_POWER_ON instead of ADL_INIT_REBOOT_FROM_EXCEPTION. Also the back-trace collected from developer studio

Watch dog reset. Tsk 36
	_rand_init+0
	_rand_init+0
Watch dog reset. Tsk 36
	_rand_init+0
	_rand_init+0
Watch dog reset. Tsk 36
	_rand_init+0
	_rand_init+0
Watch dog reset. Tsk 36
	_rand_init+0
	_rand_init+0
Watch dog reset. Tsk 36
	_rand_init+0
	_rand_init+0
Watch dog reset. Tsk 36
	_rand_init+0
	_rand_init+0
Watch dog reset. Tsk 36
	_rand_init+0
	_rand_init+0
Watch dog reset. Tsk 36
	_rand_init+0
	_rand_init+0
Watch dog reset. Tsk 36
	_rand_init+0
	_rand_init+0
Watch dog reset. Tsk 12
	_rand_init+0
	_rand_init+0
Watch dog reset. Tsk 36
	_rand_init+0
	_rand_init+0
Watch dog reset. Tsk 29
	_rand_init+0
	_rand_init+0
Watch dog reset. Tsk 36
	_rand_init+0
	_rand_init+0

Please, some one explain me the internal implementation of wip_TCPClientCreate() function and wip_close() function.
Because with my further debugging, I figure out that after these two API’s have been invoked successfully, some time SL6087 module resets.

In case of wip_TCPClientCreate() API, WIP_OPEN event is expected, but some time, instead of the WIP_OPEN event, the SL6087 resets.

Same way when wip_close () API is called, the socket_finalize r function is getting invoked successfully, once the controls returns out this function, some time the SL6087 resets.

In both cases the adl_InitGetType() returns ADL_INIT_POWER_ON instead of ADL_INIT_REBOOT_FROM_EXCEPTION.

Have you tried increasing stack size :question:

I have not yet increased the stack size.
Please let me know, how should I calculate the required stack size.

After export task to message handler using FIFO to transfer data between the message handler and message queue synchronize and control the execution of message handler.
Now, I am not seeing the Sl6087 Reset issue.

What really fixed the issue I don’t know. :slight_smile: