Autoreconnect fails

Hello all,

Could it be that there is some hardwired limit of the number of connections in Remobjects Remoting?

I have a multi-threaded client application that communicates with a lot of servers on the internet using TROSynapseSuperTCPChannel. All in all there are like 400 of such communication threads working simultaneously. The Remoting SDK version is 10.0.0.1495.

I have the problem that sometimes this superchannel fails to auto-reconnect when there has been a short network interruption. This happens several times daily now and it is becoming a real problem for me. When such an outage occurs, the first exception I get is EROtimeout (after 10 seconds), then subsequently I get ERoSuperChannelException “No connection available”, endless times , and each time that takes 25.6 seconds. Sometimes it just won’t autoreconnect and I can only completely terminate the thread and re-create it to get things going again.

These are the settings I use in each communication thread :

procedure TBinaryChannelModule.CreateComponents;
var
item: TROMessageEnvelopeItem;
begin
AES_Encryption_envelope := TROAESEncryptionEnvelope.Create(Self);
AES_Encryption_envelope.EnvelopeMarker := ‘AES’;
AES_Encryption_envelope.Password :=SecretPassword;

fMessage := TROBinMessage.Create(Self);
fMessage.MinSizeForCompression := 1024;
item := fMessage.Envelopes.Add as TROMessageEnvelopeItem;
item.Envelope := AES_Encryption_envelope;

fClientChannel := TROSynapseSuperTCPChannel.Create(Self);
fClientChannel.SynchronizedProbing := False;
fClientChannel.AckWaitTimeout := 15000;
fClientChannel.AutoReconnect := TRUE;

Attachevents; //connects client channel events (such as onException etc)
//“host” and “port” are set at a later stage.
end;

Hi,

Our code has no limit of the number of connections but your version of Windows may have.

try to catch errors in channel.OnException event. there you can terminate current connection and create a new one if needed.

Hi Evgeny,

I would be grateful for a quick explanation how to do that!

Thanks,
Arthur

Hi,

as I can see, channel can’t recover successfully after timeout.
so you can use the OnException event for stopping this channel and creating a new one or try to reuse channel after it will be inactive.

you should set aRetry to False in this event.
as a result, channel will be inactive in few seconds so you can try to reconnect this channel or create and launch a new channel for the same address.

ofc, if you select reuse of existed channel, you need to wait until it becomes inactive.

Note: you may need to set fClientChannel.AutoReconnect := False;

Hi Evgeny,

do you mean something like this? Could this work?

Kind regards,
Arthur

procedure TBinaryChannelModule.ClientChannelException(sender: TROTransportChannel; 
anException: Exception; var aRetry: BOOLEAN);
var
  count: INTEGER;
begin
  if (anException is eROTimeout) then
  begin
    aRetry := False;
    inc(ftimeouts);
    (sender as TROSynapseSuperTCPChannel).AutoReconnect := False;
    (sender as TROSynapseSuperTCPChannel).Active := False;
    for count := 1 to 10 do
    begin
      sleep(1000); // wait until inactive
      if not(sender as TROSynapseSuperTCPChannel).Active then
         Break;
    end;
    (sender as TROSynapseSuperTCPChannel).AutoReconnect := TRUE;
    (sender as TROSynapseSuperTCPChannel).Active := TRUE;
  end;
end;

Hi,

not yet.

you shouldn’t restart channel in this event because this exception is raised in middle of channel logic so it can just add some problems.

I think, better to do something like:

procedure TBinaryChannelModule.ClientChannelException(sender: TROTransportChannel; 
anException: Exception; var aRetry: BOOLEAN);
begin
  if (anException is eROTimeout) then
  begin
    aRetry := False;
    flist.Add(sender); // = TList<TROSynapseSuperTCPChannel> 
    ftimer.Enabled := True;  //= TTimer
  end;  
end;

procedure TBinaryChannelModule.TimerOnTimer(Sender: TObject);
var
   i: Integer;
begin
   for i := flist.Count - 1 downto 0 do begin
      if not flist[i].active then begin
        flist[i].active := True;
        flist.delete(i);
      end;
   end;
   if flist.Count = 0 then TTimer(Sender).Enabled := False;
end;

ofc, you can add protection with critical section or so

Hi Evgeny,

I really can’t use any timers here, the whole thing is running in a dedicated thread (no HWND ==> no Timer) and everything is synchronous/blocking.

But if I understand correctly, what I must do is

  • set Retry, Autoreconnect and Active to False in this OnException event handler,
  • Exit the event handler
  • Do nothing for a few seconds,
  • then set AutoReconnect and Active to TRUE
  • then try again?

Hi,

I can suggest to use Channel.OnDisconnected event instead of OnException:

procedure TBinaryChannelModule.CreateComponents;
...
fClientChannel.AutoReconnect := False; //<< !!!
fClientChannel.SynchronizeEvents := False;
fClientChannel.OnDisconnected := ClientChannelDisconnected;
...
end;

procedure TBinaryChannelModule.ClientChannelDisconnected(Sender: TObject);
begin
  sleep(xxx);  // xxx can be ~ 100-1000 . you can choose value what is better suitable for your application
  TROSynapseSuperTCPChannel(Sender).Active := True; //<< probably better to set some variable and restart the channel outside of event. it depends on your code
end;

Hi Evgeny, won’t this method just try to re-connect only one time, when the connection goes down? What if the host can’t be reached for a longer period of time?

It depends on your application.
can you describe in brief what does your client-side.
if it just receives events from other servers - this is one case, it it calls server-side’s methods - this is other case.

you can drop email to support@ if this is private info

No, it only calls server’s methods. No events are coming back. But that may change in the future.

The fact that the channel doesn’t seem able to reconnect by itself after a eROTimeout, is that a bug?

in this case, the best suitable solution for you:

  • create TBinaryChannelModule by request
  • perform call. (channel will be connected automatically)
  • close channel (channel.Active := False)
  • [optionally] destroy module

benefits: you don’t need to keep 400+ connections simultaneously. as a result, a chance that you will receive EROTimeout is decreased

it can be considered as a bug, but it is hardly reproduced and some specific conditions is required because in most cases it is recovered successfully.

Hi Evgeny,

The threads communicate constantly, they stream 1 Hz sensor data to our office. I need the connections to stay alive to minimize latency. If latency exceeds 1 s, the threads simply download multiple records in one call so they can keep up.

But I wonder if I need autoreconnect at all. I don’t use events.

You say that performing the call will connect anyway.

So if an eRoTimeout or eRoSuperChannelException occurs, isn’t it simply sufficient to stop communicating, set active to FALSE, wait a bit, and set Active to TRUE again before resuming ?

internal logic of SuperTCP channel a bit complicated.

for your case, the best suitable plain TCP channel because you don’t need real-time events from server-side. also plain TCP channel hasn’t eRoTimeout or similar errors because it shouldn’t send ping packages constantly.