UART bug in OpenAT 7.x

Ok, sounds promising. We’ve managed to find 7.40a in the Developer Studio repository but no earlier versions or even WPKs for 7.40a. Where did you find 7.40?

R7.40a should be OK on the UART front. Give it a try. I downloaded (R7.40) when it was current.

The R7.4a version we found seems to somewhat unstable. Do you have a wpk of R7.4 for the extreme you could post?

Best regards
Fredrik

Our local distributor just sent us R7.4a00 and WipSoft5.12 which works as expected but have the same CTS problem as all the newer versions. I guess the bug is older than that.

It’s quite odd that no one else here has run in to the same bug.

I’ve run into something like that, but generally ignored, as I don’t need to send such amounts of data, did some test with large data, found I’m losing some of it, but left it like that.

We’ve done some further testing and have found the bug as early as Open AT 7.4a, that is, all versions that are compatible with the Q26 Extreme.

Is someone from Sierra reading this? Either there’s a major bug in your software or the documentation left something important out. We’re calling our local distributor every other day for help, but they don’t seem to get any answers from Sierra.

I’ve been informed by our local distributer that this thread has been sent to the development team, so hopefully, we will have some answers here soon.

It’s a real shame that SiWi don’t see fit to actually engage on their own forum.

:frowning:

They could certainly learn a lot from the way that TI engage on their forums…

awneil: I agree. We still haven’t heard anything from Sierra about this issue despite several e-mails to our local distributer. Not even an email saying they are looking at it.

Is anyone associated with Sierra reading this thread? We’ve been waiting for a about 3 weeks for an answer now. Either there is a major bug in Open AT 7.44 which should be quite easy to fix (very easy to reproduce on alla modules we’ve tested), or there’s something missing in the documentation.

We’ve had representatives from our local distributer here who have verified the bug with their own equipment and sent traces to Sierra. Unfortunately no response at all from Sierra yet.

Just like to add my 2 cents. We are also experiencing this same issue and have gotten no response from our problem reports.

I have to agree with awneil, the level of support from TI puts these guys to shame.

What is TI, Texas Instruments? 8)

Yes: TI = Texas Instruments

That’s really disturbing. We’ve designed our products for the Q26E with scheduled deployment during 2011. We’re expecting to deploy at least a few thousand units during 2011/2012, but that won’t work with a manufacturer that doesn’t even answer basic questions.

Have you guys had any experience with other brands of GSM/GPRS/UMTS-modules. We’re looking for temperature tolerant, energy efficient modules. But, most of all robust and from a manufacturer who understands the importance of documentation and support. Without that, the best HW becomes unusable for large deployments.

Short update for those with similar issues:

We’ve recieved a response from Sierra (who has analyzed the debug traces we collected) suggesting that there might be three possible explanations to this issue:

  1. Hardware flow control is not enabled
  2. At module side, UART RX FIFO is too small
  3. At PC side, UART TX FIFO size is too large, so that when PC receive the CTS signal and stop the transfer, the data already in FIFO will still be transmitted to module side and causing the overrun

#1 is not true since we use AT+IFC=2,2 and the CTS line asserts and deasserts when we flood the unit with data (just a little late). #3 could be true, but we’ve tried with both embedded modules where we control every bit put in the TX line and standard RS232 cables from a PC with the same result. So, if this is true, the buffer on the Q26 side is so small that basically no equipment can communicate with it using flow control.

So, that leaves us with #2, which is what we’'ve been suspecting for a while. To verify this Sierra has sent “OASIS 2.35 WP10” that supports the “AT+WHCNF=6,2” which should increase the RX FIFO.Unfortunately, this firmware version won’t recognize and external SIM-card which makes it impossible for us to test anything. We’re currently waiting for an answer from Sierra once again.

Update for those following this issue:

We’re still waiting for Sierra to fix the external SIM-card issue mentioned above so that vi can assert that the buffer increase resolves the bug. We recieved a new build of Oasis 2.35 last week, but with the exakt same SIM-card issue as the first one. We’re still trying to figure out why they sent us this. Either they are really stressed and just sent us something to make us happy for a few hours, or, our feedback is lost on they way to the Sierra support via our local distributor.

I’ve seen a forum administrator answering some questions in other threads, how come there isn’t even a comment on this whole issue here?

We have now recieved a version of 7.45 beta where we can set the FIFO size, but the bug remains on the Q26 Extreme. All responses from Sierra (via our local distributer) are conflicting information about possible causes for the bug and how different buffers interact which doesn’t make sense. We’ve even recieved questions on how to set up a simple server!

Since the deployment of our whole system is stalled due to this bug (causing major economic losses each day), we have decided to make contact with people higher up in the Sierra organization.

Parallel to this, we have put together a client and a server to reproduce the bug using a standard Sierra demo-board and a Q26 Extreme module running Open AT 7.4x (remove the WHCNF line if you are using 7.4X) with simple instructions (client and server attached to this message):


The two files attached are a client and a server to reproduce the bug found in all OpenAT 7.x firmware releases running on Q26Extreme. Follow these instructions step by step:

  1. Put the file uplinkTestServer.py on a computer with a public IP
  2. run “python uplinkTestServer.py” on that computer
  3. Connect another computer to a Sierra demo-board with a Q26Extreme running OpenAT 7.5x beta and WipSoft 5.40 on it
  4. Send “AT+WIND=255” to the Q26Extreme followed by “AT&W”
  5. Replace the string YOURHOSTNAMEHERE.COM in testUplink.py with the IP or domain name of the computer running uplinkTestServer.py
  6. Run “python testUplink.py” on the computer connected to the demoboard
  7. Reset the Q26Extreme

The client will now use the Q26 to connect to the server running uplinkTestServer.py and write 0-100000 via TCP/IP to the server with one number per line. As soon as the server recieves a line that does not consist of the last number +1, it will exit and print the erronous line recieved. We have tried this with several releases of Open AT, both 7.4x and 7.5x, and they will all produce erronous lines!


Arkiv.zip (2.38 KB)

Just something to check which caused us problems once.
Have you looked at your TCP/IP packet sizes? If the TCP/IP is set to NO_DELAY you can get really small packets which the server/connection can struggle to handle, so you loose some bits of data in a large transfer.
Perhaps the default has changed??

BenFT01: thanks for your suggestion. Unfortunately, changing the TCP/IP packet size does not seem to make any difference for this bug.

We now have direct contact with the are sales manager for Northern Europe who is trying to get the bug resolved and also trying to get more information about what’s causing it.

Currently, the bug seems to be resolved in Open AT 7.45 beta by changing the FIFO buffer size with the command “AT+WHCFN=6,3”. Unfortunately, the Q26Ex has a chipset that does not accept 7.45 beta since it’s “unsigned”. We are therefore waiting for the manufacturer of the chipset to sign the beta release so that we can verify that it works on Q26Ex.

We are currently trying to get information on when we can have this software.

Try this on a Windows (XP) PC connected to the COM port of the module:

Go to device manager
Open the properties of the COM port you are using
Go to port settings tab and then click on Advanced
Reduce the FIFO TX size from 16 to a lower value (say 7)
Now see if you can reproduce the data loss. On my colleague’s desktop, FIFO Tx size of 16 works fine with an high speed UART (PCI) card. 7 is the value which works well on our laptops (to avoid data loss). So, try different values to see what works the best for you.
If you have see the traces on the module, look for “[data_drv_on_error] Error 0002 on UART1” which signifies UART overrun. If this error is gone from your traces, there shouldn’t be any data loss.

If you are using USB-Serial cable to connect to your module, then try different versions of the Prolific driver. Read http://www.vems.hu/wiki/index.php?page=ProlificWindowsDriver for more info.