Problem of reconnecting to server

Hello,

I have an application in which I am creating a connection using the WIP package to a server and sending periodic data. When the server receives the data it sends an “OK”. I use a timeout for the response of the data and if I do not receive a response in a set period of time, I disconnect and reconnect to the server to guarantee that I do not take to long to realize when the server is not available.

The software when it reconnects retries every 30 seconds to reconnect , using the connect2server function below, and if I have no positive connection indication after 15 seconds, I disconnect using the closeConnection function below and then 30 seconds later, I try to reconnect again.

//open connection
void connect2Server(void)
{
adl_atSendResponse ( ADL_AT_UNS, “Try to connect to server\n”);
socket = wip_TCPClientCreateOpts ( TCP_PEER_DATUS, TCP_PEER_PORT, cbevh, NULL,
WIP_COPT_SND_BUFSIZE, 3000, WIP_COPT_MAXSEG, 1500, WIP_COPT_END );
if ( !socket ) //we will not do anything here, if there is an error we will retry later
{
adl_atSendResponse ( ADL_AT_UNS, “Error on create TCP socket\n”);
return;
}
}

//close connection
void closeConnection(void)
{
adl_atSendResponse ( ADL_AT_UNS, “close connection\n”);
wip_close(socket);
connected = 0; //indicates that we are not connected.
}

//connection handler
static void cbevh ( wip_event_t *ev, void *ctx )
{
//adl_atSendResponse ( ADL_AT_UNS, “\r\nEvent\r\n” );
switch ( ev->kind )
{

case WIP_CEV_DONE:
{
           adl_atSendResponse ( ADL_AT_UNS, "Got Done from Socket\n");
           break;
 }
//this means that we have an open from the socket
    case WIP_CEV_OPEN:
    {
        TRACE ( ( NORMAL_TRACE_LEVEL, "[SAMPLE] Connection established successfully" ) );
        adl_atSendResponse ( ADL_AT_UNS, "Got Open from Socket\n");
        setCanSend();
        connected=1;               //this I use to know that I am connected
        break;
    }
    case WIP_CEV_READ:
    {
              //application specific read data logic
    }
    case WIP_CEV_WRITE:
    {
             //application specific write logic
    }
    //this indicates that we have an error from the socket in which case we close
    //the socket
    case WIP_CEV_ERROR:
    {
    	reportError(ERR_TYPE_SERVER, ERR_SOCKET);
    	adl_atSendResponse ( ADL_AT_UNS, "Got Error from Socket\n");
        TRACE ( ( ERROR_TRACE_LEVEL, "[SAMPLE] Error %i on socket. Closing.", ev->content.error.errnum ) );
        closeConnection();
        //wip_close ( ev->channel );
        break;
    }
    //this indicates that the server closed the socket
    case WIP_CEV_PEER_CLOSE:
    {
    	reportError(ERR_TYPE_SERVER, ERR_SESSION_CLOSED_BY_SERVER);
        adl_atSendResponse ( ADL_AT_UNS, "Got Peer Close from Socket\n");
        TRACE ( ( NORMAL_TRACE_LEVEL, "[SAMPLE] Connection closed by peer" ) );
        closeConnection();
        //wip_close ( ev->channel );
        break;
    }
    //there should be no default
    default:
    {
        break;
    }
}

}

The system above works quite well most of the time, however we have had some occasions where our server crashes (not a topic for this forum, but is why I have detected this situation) and does not do a clean disconnect. Even in this case the system normally works ok, but I occasionally have the modem arrive in a situation where the modem does the connect2server logic (without any indication of an error) and then returns to the closeConnection in 15 seconds without any error or connection indication arriving (neither at the modem or at the server). This gives
“Try to connect to server”
(15 seconds)
“close connection”
(15 seconds)
(repeat above indefinitely, at least for as long as I have had patience to let it continue)

If I stop the application software and restart it, all is well. I believe that I could do a reset using an AT+FUN=1 command to clean up the stack, but I am trying to avoid this as I am keeping a FIFO of data in the application and do not want to lose this data without sending it.

Any ideas of how to get out of the situation if it occurs, or any logic errors that I have in the above architecture?

Any help is appreciated,

Hiya,

I had some issues similar to this in a mobile application where the modem would fall off the network (usually by going through an area of bad/no signal), but the TCP connection would take up to 12 minutes to attempt to reconnect.

There’s a bunch of TCP parameters that you can configure to help you get through this issue.

Have a look at the settings for wip_netSetOpts - particularly things like WIP_NET_OPT_TCP_REXMT_MAX and WIP_NET_OPT_TCP_REXMT_MAXCNT which control retry timeouts.

ciao, Dave

Thank you, OK so I am changing the values to let the WIP library do things it’s way and increasing my own values to keep my logic out of the way, and hopefully by letting it timeout internally all will be well. I’ll see what happens.

ok, so with the adviced changes I seem to have a much more reliable system, but not 100%.

I have changed the logic so that I give 35 sec to establish a connection and I give TCP command 1 retry with a 20 second delay. I only call wip_close() when I receive either a WIP_CEV_ERROR or WIP_CEV_PEER_CLOSE. I can survive through a lot of disconnects (done by blocking the signal so that it cannot talk to the provider). But still after a large number of disconnects (I have not counted, but definitely more than 10 and probably more than 20), I end up in the case where I can no longer reconnect.

After the disconnect, my logic waits 30 seconds and then tries to reconnect, but when it is blocked, it always ends up with a WIP_CEV_ERROR after the 35 seconds, and the LED stays red.

My next idea is to reinitialize the bearer and resolves the situation, as I have noted a “AT+CFUN=1” always resolves the situation.

Any guidance is appreciated.

oh and by the way, the blockage is no longer always associated with a server crash (I had which was the initial source of the problem reversed, it seems to be the modem that caused the problem, not the server)