Callbacks blocking new requests

We are using Delphi 10.3.2, Remoting sdk VERSION 10.0.0.1555, Synapse SuperTcp , Olympia/InMemorySessoinManager, Windows
(the problem happens using olympia or not)

In my application from time to time we get a deadlock situation where new requests cant be processed, in the last RO version the lock in the sessionmanager was fixed, but now we notice more deadlocks, in this case when using callbacks

After analyze some callstacks we belive we identify the problem, in the uROBaseSuperTCPServer.pas file, function TSendEvent.Callback

	fOwner.fClients.LockList;
        try
          {$IFDEF ROUseGenerics}
          if not fOwner.fGuidToClientMap.TryGetValue(fClientGuid, l_worker) then Exit;
          if fWorkerOverride <> nil then
            l_worker := TROSCServerWorker(fWorkerOverride.GetTransportObject);
          {$ELSE}
          i := fOwner.fGuidToClientMap.IndexOf(GUIDToString(fClientGuid));
          if i = -1 then Exit;
          if fWorkerOverride <> nil then
            l_worker := TROSCServerWorker(fWorkerOverride.GetTransportObject)
          else
            l_worker := TROSCServerWorker(fOwner.fGuidToClientMap.Objects[i]);
          {$ENDIF}
	
	  //Isso não deveria estar fora deste critical?
          l_ask := l_worker.SendPackage(l_stream, 0); 		
        finally
          fOwner.fClients.UnlockList;
        end;

If l_worker.SendPackage takes too much time all my server will be block for that time, so lets say we have a really bad connection that takes 10s to fulfill the request, every client will wait for 10 seconds to process new requests, wich is not ideal

here is where all client will wait

procedure TROBaseSuperTCPServer.IntExecute(ABaseSuperConnection: TROBaseSuperTcpConnection);
...
    l_worker := TROSCServerWorker.Create(Self, ABaseSuperConnection);
    l_worker.MaxPackageSize := Self.MaxPackageSize;
    //fClient esta bloqueado
    fClients.Add(l_worker);
    try
      l_worker.DoExecute;
    finally
      fClients.Remove(l_worker);
      ....
    end;

Now, in production we have from time to time a very very very long frozen application situation (at least a few minutes, we never wait to see if will finish or not) we couldn’t reproduce that, but we can reproduce freeze all requests on the server for a few seconds.

Steps to reproduce

  • Compile both client and server projetcs
  • Open 1 server and 2 client.exe (we use two computers, but i think will work on the same pc)
  • Configure client.exe if ipadress/port
  • Open clumsy, or any other program that you like to fake a bad connection
  • On client1 click “Login and register for callback” them reserve, everything will works fine
  • On server click “Send callback”, on the client1 you will see a callback counter increase, as expected
  • Open perfmom on windows, go in network and get the port the client use to connect on server
  • On clumsy put this filter tcp.DstPort == PORT_HERE or tcp.SrcPort == PORT_HERE
  • On clumy choose 2000 on the lag field and click start
  • On server click “Send Callback” befor the 2 seconds try to connecton on the second client, you will not be able until the 2 seconds mark
  • To be sure the clumsy will not affect the second client, you can stop it and test i you like

Now, im not sure why this critical is beggin use to be honest, there is 3 locks total when a callback is send, this one fOwner.fClients.LockList; dosent seems right to me, but off course im not 100% sure
Demo.rar (2.2 MB)

Hi,

Can you retest this code in your working environment, pls?

// uROBaseSuperTCPServer.pas
procedure TSendEvent.Callback(Caller: TROThreadPool; Thread: TThread);
..
      l_retry := False;
      try
        l_ask := TROSCServerWorker(fWorkerOverride.GetTransportObject).SendPackage(l_stream, 0);
        TROSCServerWorker.WaitForAck(l_ask, fOwner.fAckWaitTimeout);
        if Supports(fOwner.fEventRepository, IROValidatedSessionsChangesListener, l_listener) and not IsEqualGUID(l_id, EmptyGUID) then
          l_listener.EventSucceeded(fClientGuid, l_id);
      except

can u send me the complete code of the function? just to be 100% safe

Hi,

Check PM

removing the lock works as expected, should i just push this workaround until next build?

Hi,

yes.
I’d expect some testing from you: Can you test it in your working environment, pls.
if it works as expected and doesn’t cause any other issues - I’ll merge this fix to main trunk.

Ok, i wil push for production. i have been testing something similar since saturday and everything seems fine.

I will give you some feedback this week (as you know, is hard to be 100% sure, this is very random)

Update:

We didn’t notice anyproblems after the change.

We do have a few clients using olympia instead, and we notice some slowdowns today in 1 function using too much callback, so maybe the problem is happing on olympia too? (Not sure about this one, still need to investigate further)

Logged as bugs://D19334.

bugs://D19334 was closed as fixed.

Hi,

I’ll try to review .NET code

Ok, now we can confirm

We do have a process using a lot (i mean, a lot) of callbacks using olympia the function will crash the server, without olympia (with the last fix) works as expected

When this function is running, new clients cant connect and we start to get a lot of “No connection available” erros

When i say crash i mean is so slow that become unusable

Hi,

Do you have unstable connection between Delphi server and Olympia?

We have fixed issue when server sends events to client:
client ← (fix here) server ← → Olympia

in general, you shouldn’t have any issues between server and Olympia.

I do have unstable connection between delphi server and delphi client, but only using olympia.

Hi,

Can you check what errors are raised on server-side when your server uses Olympia event repository, pls?

Sorry, my mistake on the preview post.

my client cant connect to the server, but the error is raise on the server-side when trying to talk to olympia

client ← server ← “No connection available” → Olympia

this will happen everytime i send too much callbacks to the client

Hi,

Can you replace Synapse SuperTCP channel that communicates with Olympia with Indy (TROIndySuperTCPChannel) or Socket ( TROSuperTcpChannel) version and retest, pls?

Will it work more stable or not?

SynapseSuperTCPServer - Timeout e No connection available
IndySuperTCPServer - Timeout e Timeout waiting for response
ROSuperTCPServer - No connection available

in every test after the function is done we can connect again.

in some tests we had to wait a little more before the error, but not by much

Hi,

Can you specify, is this appeared when you store events in Olimpia server or when you receive already stored events for specific client?