Callbacks blocking new requests

Promedico_Gestao_Hos · March 27, 2023, 11:58am

We are using Delphi 10.3.2, Remoting sdk VERSION 10.0.0.1555, Synapse SuperTcp , Olympia/InMemorySessoinManager, Windows
(the problem happens using olympia or not)

In my application from time to time we get a deadlock situation where new requests cant be processed, in the last RO version the lock in the sessionmanager was fixed, but now we notice more deadlocks, in this case when using callbacks

After analyze some callstacks we belive we identify the problem, in the uROBaseSuperTCPServer.pas file, function TSendEvent.Callback

	fOwner.fClients.LockList;
        try
          {$IFDEF ROUseGenerics}
          if not fOwner.fGuidToClientMap.TryGetValue(fClientGuid, l_worker) then Exit;
          if fWorkerOverride <> nil then
            l_worker := TROSCServerWorker(fWorkerOverride.GetTransportObject);
          {$ELSE}
          i := fOwner.fGuidToClientMap.IndexOf(GUIDToString(fClientGuid));
          if i = -1 then Exit;
          if fWorkerOverride <> nil then
            l_worker := TROSCServerWorker(fWorkerOverride.GetTransportObject)
          else
            l_worker := TROSCServerWorker(fOwner.fGuidToClientMap.Objects[i]);
          {$ENDIF}
	
	  //Isso não deveria estar fora deste critical?
          l_ask := l_worker.SendPackage(l_stream, 0); 		
        finally
          fOwner.fClients.UnlockList;
        end;

If l_worker.SendPackage takes too much time all my server will be block for that time, so lets say we have a really bad connection that takes 10s to fulfill the request, every client will wait for 10 seconds to process new requests, wich is not ideal

here is where all client will wait

procedure TROBaseSuperTCPServer.IntExecute(ABaseSuperConnection: TROBaseSuperTcpConnection);
...
    l_worker := TROSCServerWorker.Create(Self, ABaseSuperConnection);
    l_worker.MaxPackageSize := Self.MaxPackageSize;
    //fClient esta bloqueado
    fClients.Add(l_worker);
    try
      l_worker.DoExecute;
    finally
      fClients.Remove(l_worker);
      ....
    end;

Now, in production we have from time to time a very very very long frozen application situation (at least a few minutes, we never wait to see if will finish or not) we couldn’t reproduce that, but we can reproduce freeze all requests on the server for a few seconds.

Steps to reproduce

Compile both client and server projetcs
Open 1 server and 2 client.exe (we use two computers, but i think will work on the same pc)
Configure client.exe if ipadress/port
Open clumsy, or any other program that you like to fake a bad connection
On client1 click “Login and register for callback” them reserve, everything will works fine
On server click “Send callback”, on the client1 you will see a callback counter increase, as expected
Open perfmom on windows, go in network and get the port the client use to connect on server
On clumsy put this filter tcp.DstPort == PORT_HERE or tcp.SrcPort == PORT_HERE
On clumy choose 2000 on the lag field and click start
On server click “Send Callback” befor the 2 seconds try to connecton on the second client, you will not be able until the 2 seconds mark
To be sure the clumsy will not affect the second client, you can stop it and test i you like

Now, im not sure why this critical is beggin use to be honest, there is 3 locks total when a callback is send, this one fOwner.fClients.LockList; dosent seems right to me, but off course im not 100% sure
Demo.rar (2.2 MB)

EvgenyK · March 27, 2023, 12:43pm

Hi,

Can you retest this code in your working environment, pls?

// uROBaseSuperTCPServer.pas
procedure TSendEvent.Callback(Caller: TROThreadPool; Thread: TThread);
..
      l_retry := False;
      try
        l_ask := TROSCServerWorker(fWorkerOverride.GetTransportObject).SendPackage(l_stream, 0);
        TROSCServerWorker.WaitForAck(l_ask, fOwner.fAckWaitTimeout);
        if Supports(fOwner.fEventRepository, IROValidatedSessionsChangesListener, l_listener) and not IsEqualGUID(l_id, EmptyGUID) then
          l_listener.EventSucceeded(fClientGuid, l_id);
      except

Promedico_Gestao_Hos · March 27, 2023, 1:46pm

can u send me the complete code of the function? just to be 100% safe

EvgenyK · March 27, 2023, 1:48pm

Hi,

Check PM

Promedico_Gestao_Hos · March 27, 2023, 2:09pm

removing the lock works as expected, should i just push this workaround until next build?

EvgenyK · March 27, 2023, 3:42pm

Hi,

yes.
I’d expect some testing from you: Can you test it in your working environment, pls.
if it works as expected and doesn’t cause any other issues - I’ll merge this fix to main trunk.

Promedico_Gestao_Hos · March 27, 2023, 3:46pm

Ok, i wil push for production. i have been testing something similar since saturday and everything seems fine.

I will give you some feedback this week (as you know, is hard to be 100% sure, this is very random)

Promedico_Gestao_Hos · April 1, 2023, 1:02pm

Update:

We didn’t notice anyproblems after the change.

We do have a few clients using olympia instead, and we notice some slowdowns today in 1 function using too much callback, so maybe the problem is happing on olympia too? (Not sure about this one, still need to investigate further)

RemObjectsSoftware · April 3, 2023, 6:36am

Logged as bugs://D19334.

RemObjectsSoftware · April 3, 2023, 6:37am

bugs://D19334 was closed as fixed.

EvgenyK · April 3, 2023, 1:06pm

Hi,

I’ll try to review .NET code

Promedico_Gestao_Hos · April 3, 2023, 5:21pm

Ok, now we can confirm

We do have a process using a lot (i mean, a lot) of callbacks using olympia the function will crash the server, without olympia (with the last fix) works as expected

When this function is running, new clients cant connect and we start to get a lot of “No connection available” erros

Promedico_Gestao_Hos · April 3, 2023, 5:22pm

When i say crash i mean is so slow that become unusable

EvgenyK · April 3, 2023, 5:35pm

Hi,

Do you have unstable connection between Delphi server and Olympia?

We have fixed issue when server sends events to client:
client ← (fix here) server ← → Olympia

in general, you shouldn’t have any issues between server and Olympia.

Promedico_Gestao_Hos · April 3, 2023, 5:50pm

I do have unstable connection between delphi server and delphi client, but only using olympia.

EvgenyK · April 4, 2023, 6:28am

Hi,

Can you check what errors are raised on server-side when your server uses Olympia event repository, pls?

Promedico_Gestao_Hos · April 6, 2023, 11:08am

Sorry, my mistake on the preview post.

my client cant connect to the server, but the error is raise on the server-side when trying to talk to olympia

client ← server ← “No connection available” → Olympia

this will happen everytime i send too much callbacks to the client

EvgenyK · April 10, 2023, 8:48am

Hi,

Can you replace Synapse SuperTCP channel that communicates with Olympia with Indy (TROIndySuperTCPChannel) or Socket ( TROSuperTcpChannel) version and retest, pls?

Will it work more stable or not?

Promedico_Gestao_Hos · April 10, 2023, 1:50pm

SynapseSuperTCPServer - Timeout e No connection available
IndySuperTCPServer - Timeout e Timeout waiting for response
ROSuperTCPServer - No connection available

in every test after the function is done we can connect again.

in some tests we had to wait a little more before the error, but not by much

EvgenyK · April 11, 2023, 11:54am

Hi,

Can you specify, is this appeared when you store events in Olimpia server or when you receive already stored events for specific client?