tcp:connect retry in Lua failing


#1

Hi,

I’m polling a device over TCP/IP but when I disconnect the network the Lua connection does not recover the connection. My connect code is like this:

local n = {}

-- defaults
n.host = "192.168.10.20"
n.password = "xoxoxox"
n.port = 503
n.timeout = 0.5
n.isConnected = false

function n.connect(host, port)
	if (nil == host) then host=n.host end
	if (nil == port) then port=n.port end

	if (true == n.isConnected) then
		tcp:close()
		n.isConnected = false
		log.trace("TCP",'INFO',"Closed connection to "..host.."/"..port)
	end
	
	local err = tcp:connect(host, port)
	tcp:settimeout(n.timeout)
	if (nil == err) then
		log.trace("TCP",'ERROR',"NOT Connected to "..host.."/"..port)
		n.isConnected = false
	else
		n.isConnected = true
		log.trace("TCP",'INFO',"Connected to "..host.."/"..port)
	end
	return err
end

I thought that would be the correct logic - if the connection was already open when calling connect it should be closed first before re-opened. My caller code is something like this:

local err = n.connect()
    while shouldContinue do
		rxMsg = n.poll('<getdata>')
		if (nil == rxMsg) then		-- retry the connection
			connectError = connectError + 1
   			log(LOG_NAME, 'ERROR', 'Connection failed '..connectError..' times, retrying...')
			err = n.connect()
                        ...etc re-login if connected
...
    end

The Lua keeps trying to connect but the tcp:connect() just returns nil. If I poll the device with another diagnostic tools on the same IP credentials I can connect ok, so the third party device is fine & waiting for a connection.

Comments very welcome.

Thanks in advance. Steve


#2

Hi Steve,

wondering if it is not an issue at the IP level rather than at the AAF level.
I did a test on my side opening a AAF TCP client app (sending dummy string every 2sec) towards a TCP server, removed the cable, replugged the cable, and the AAF app recovered connection and I resumed receiving on the server side my dummy strings, even the ones from when the cable was unplugged. My app does not even re-connects the tcp connection, it remains on the initial one all the time.
My Airlink (RV50) was DHCP server, and my linux test laptop (where the tcp server ran) is client.

Who is the DHCP client and DHCP server in your case, or do you have maybe a static IP conf?


#3

Unrelated comment but this line

if (nil == host) then host=n.host end

is usually written with a common Lua idiom that is

host = host or n.host

Regarding your issue: I am not sure the same tcp object can be reused once it has been closed.
Please note that in any case you can see what is the reported error, which would help you investigate the root case. A typical behavior for Lua function is to return nil + an error string.

So in your code you could have:

local s, err = tcp:connect(host, port)
If not s the print(err) end

More generally, to re-create a socket object for the new connection you can use the API shortcut that is socket.connect(host, port) (it is not one the tcp: method, but the socket library connection function) that returns directly the connected socket. so you would have to do

tcp, err = socket.connect(host, port)

and tcp would contain the connected socket.


#4

Hi,

Noted your replies thanks.

I’m using static IPs.

I will try to use “var1 = var2 or var3” type assignments more. I was just telling colleagues about this neat way of assigning variables this morning too! I put my lapse down to jumping between programming languages.

I’ll see if I can get more errors & report back.

Steve


#5

I suspect the issues is elsewhere. This code has run for almost a day within the ZeroBrane Lua IDE under Windows yet it fails in the LS300 after a period of time:

local socket = require("socket")
local tcp = assert(socket.tcp())

function pollDev(msg)
  print("TX:"..msg)
  
  local err = tcp:send(msg);
  
  if (err == nil) then
    print ("Dev connect issue")
    return "<error>"
  end

  local rxBuff = ""
  local bGetData = true
  while bGetData do
      local s, status, partial = tcp:receive()
      if (s ~= nil and s ~= "") then
        rxBuff = rxBuff .. s
      end
      if status == "closed" or status == "timeout" then 
        bGetData = false
        if (partial ~= nil and partial ~= "") then
          rxBuff = rxBuff .. partial
        end
      end
  end
  
  print("RX:"..rxBuff)

  return rxBuff
end

function connectDev()
  local host, port = "192.168.10.20", 503
  local err = tcp:connect(host, port)
  tcp:settimeout(0.5)
  print("Connected result:"..err)
end

connectDev()

while true do
  pollDev('<get><items/></get>')
end

#6

Maybe, but please note that zerobrane comes with the native set of libraries.
In AAF we tried to provided compatibility with luasocket library but may have slight changes in behavior on edges. This is why I proposed to recreate the socket once it goes into closed/error state.
Also Luasocket supports both Windows an Linux (the version we are based on in AAF), but the underlying socket API is quite different (Windows vs Linux). So even the official Luasocket library may have slightly different behavior depending on which OS it runs on.


#7

Hi,

So you propose a small change like this to the connect :

local socket = require("socket")

local n = {}
local tcp, err

-- defaults
n.host = "192.168.10.20"
n.password = "0x0x0x0x"
n.port = 503
n.timeout = 0.5
n.isConnected = false

function n.connect(host, port)
	host = host or n.host
	port = port or n.port
	
	if (n.isConnected == true) then
		tcp:close()
		n.isConnected = false
		log.trace("TCP",'ERROR',"Closed connection to "..host.."/"..port)
	end

	tcp, err = socket.connect(host, port)

	if (err == nil) then
		log.trace("TCP",'ERROR',"NOT Connected to "..host.."/"..port)
		n.isConnected = false
	else
		tcp:settimeout(n.timeout)
		n.isConnected = true
		log.trace("TCP",'INFO',"Connected to "..host.."/"..port)
	end
	return err
end

Actually, my “tcp, err = socket.connect(host, port)” assignment always sets err to nil and the docs https://www.sierrawireless.com/sitecore/content/developer-zone/resources/airlink/aleos_af/refdoc_aleos_af_api_1_3/~/media/Support_Downloads/AirLink/AAF/aleos_af_api-13.0/socket.html##(socket).connect do not mention the error returned by socket.connect(). Am I looking at the wrong docs? I’ll try tcp = socket.connect()…

Thanks, Steve


#8

Hi Steve,

err is nil when there is no error. You should test against the first returned value (i.e. tcp in you case), and only print err if tcp is nil (i.e. was not able to connect)


#9

Looking much better now thanks.