TLS 1.2 handshake failing and causing device restarts

Hey SiWi community,

So, as concisely as I can…

Two windows server 2008 r2 sp1 servers.
One server is test site and has a single sl6087 device.
Other is main server with ~50 of the same devices.
Both server run a NET4.5 application that services the device over tls1.2

The tls1.2 was implemented back in march and has been fine.

2 months ago the single test device can no longer authenticate to the test server.
I pass on info to customer and continue with my other work.

Now the same symptoms have moved over to the main server.

Network monitor on server shows:
Client Hello (the sl6087 device)
Server Hello, Certificate, Server Key Exchange + continuation data with AP flags
Then two packets from the device both Ack packets.

Wireshark on the devices doesn’t indicate the Server hello packet gets to it.
The device then restarts (shorter than 15 seconds so don’t think it is the watchdog).
So perhaps it doesn’t pass through the packet before the restart, but it is getting there?

The confusing parts are that the server connection failures began immediately after a restart, not while the NET application was running.
And with the 50 devices on the main server, not all of them have suffered the same fate. A small number haven’t exhibited the problem and are running the same application.

The only difference I see from the traces is that the previous successful TLS showed:
Client Hello (the sl6087 device)
Server Hello, Certificate, Server Hello Done
Client Key Exchange

So why has the server now decided it needs to do a server key exchange.
Certificates have not changed, windows updates have been installed. But I can’t see any common updates between servers.

I’m waiting on permission to roll back the test server to see if that is the problem and go from there.

Why would adding the server key exchange to the packet cause a problem?

Thank you for all assistance.

-Chris

Also, should apologise if this would’ve been better in the SL forum.
Since it was SSL/TLS I came here, but on reflection maybe it was more appropriate elsewhere…

-Chris

Thought I would reply, having had a bit of success finding the solution.
So this communication failure was due to a change on the windows side of things.

An update had occurred that added some TLS1.2 cipher suites, and re-organised the default SSL priority list. There was no correlation between servers because as of May this year, microsoft has started maintaining rollup packages. So the problematic update was in a different rollup package for each server due to the time difference in updating. (June for one, July for the other, the June rollup was also rolled into July for some reason, so was deprecated).

It’s odd that some devices could still run the communication, and i’m not sure what the limitation was. But addressing the priority list and moving suites around meant communication came back.