LS300 VPN disconnect

I have 6 LS300s arranged with one ‘master’ connecting via VPN tunnels to five remote stations (PLCs). All devices are using static IPs. I can get the VPNs to connect, but when one drops out, there doesn’t seem to be any rhyme or reason as to why, or how to get it to reconnect. This is a common log entry:

Mar 6 17:26:58 info racoon: INFO: IPsec-SA request for 166.***.***.110 queued due to no phase1 found.
Mar 6 17:26:58 info racoon: INFO: initiate new phase 1 negotiation: 166.***.***.126[500]<=>166.***.***.110[500]
Mar 6 17:26:58 info racoon: INFO: begin Identity Protection mode.
Mar 6 17:26:58 notice racoon: phase1(ident I msg1): 0.001861
Mar 6 17:26:58 info ALEOS_WAN_RMON: PLMN: AT&T, 310410, LAC: 26512, Cell Id: 0
Mar 6 17:26:59 info ALEOS_WAN_RMON: PLMN: AT&T, 310410, LAC: 26512, Cell Id: 76642723
Mar 6 17:27:29 info racoon: [166.***.***.110] ERROR: phase2 negotiation failed due to time up waiting for phase1. ESP 166.***.***.110[0]->166.***.***.126[0]
Mar 6 17:27:29 info racoon: INFO: delete phase 2 handler.
Mar 6 17:27:44 info racoon: [166.***.***.110] INFO: request for establishing IPsec-SA was queued due to no phase1 found.
Mar 6 17:27:48 info racoon: ERROR: phase1 negotiation failed due to time up. 85ad6d9f32b78709:0000000000000000
Mar 6 17:27:50 info ALEOS_WAN_RMON: PLMN: AT&T, 310410, LAC: 26512, Cell Id: 0
Mar 6 17:27:50 info ALEOS_WAN_RMON: PLMN: AT&T, 310410, LAC: 26512, Cell Id: 76642723
Mar 6 17:28:00 info ALEOS_WAN_RMON: PLMN: AT&T, 310410, LAC: 26512, Cell Id: 76642717
Mar 6 17:28:15 info racoon: [166.***.***.110] ERROR: phase2 negotiation failed due to time up waiting for phase1. ESP 166.***.***.110[0]->166.***.***.126[0]
Mar 6 17:28:15 info racoon: INFO: delete phase 2 handler.
Mar 6 17:28:34 info racoon: INFO: IPsec-SA request for 166.***.***.110 queued due to no phase1 found.

During this time, I have no problem accessing either modem. I am not using any Keepalive or DPD.

Any suggestions would be appreciated.

Update: After beating on this thing all day, I realized the processor load on the ‘master’ was more than 2.25 (each of the other stations were 1.0 +/- 0.25). I turned-off logging and shut down three of the five tunnels, this brought the CPU load down to 1.5, and the others below 1. For the time being, the two remaining tunnels are connected.

While I’m glad to have something working, in the end, I need all five tunnels operational. What else affects processor load? How much of an impact does network traffic have on processor load?

Still having the same problem - high processor load, leading to connect failures on the Tunnels. There is very little data going out over the cellular network - this is not a bandwidth issue. While the LS300’s have the functionality to support 5 tunnels, they don’t seem to have the capacity. I can run two tunnels, but that seems to be about it.