Massive CPU load problem, test case

Hello Remobjects team,

I have the problem that when a Remoting SDK server is offline, or when it refuses a connection, the CPU load on the client spikes unacceptably when the channel is destroyed. It maxes out one thread/core for 30 seconds! This is with version 10.0.0.1495.

This problem has been bugging me for ages.

I have a reproducible test case here. It can be reproduced with any trivial service (mine is called sessionservice). I would be utterly grateful for a fix.

Kind regards,
Arthur Hoornweg.

[edit]: More concise example

procedure TForm1.Test;
var service:iSessionService;   i:integer;
begin
   service:=cosessionservice.Create('Supertcp://www.google.com:30841');
   try
    i:=service.Serviceversion;
   Except
    screen.cursor:=crDefault;
    Showmessage('Please start the task manager and go to the CPU  Performance page. '+
    ' Press OK in this message box and look what happens to the CPU load.') ;
    screen.cursor:=crHourglass;
   end;
end;

procedure TForm1.Button1Click(Sender: TObject);
begin
   screen.cursor:=crHourglass;
   test;
   screen.cursor:=crDefault;
   Showmessage('The connection was finally destroyed...');
end;

Kind regards,
Arthur

Logged as bugs://D19119.

Hello

I have logged this issue. Unfortunately our Delphi team is at vacation this week. They will be back next Monday and will take care of this issue.

Regards

I’m including a screenshot.

According to SamplingProfiler, the CPU hog is “TROBaseSuperChannelWorker.Disconnect” , more specifically the line that does “tThread.Yield”. Replacing this line with “Sleep(1)” makes the CPU load go away but it still takes 30 seconds to destroy the channel.

[edit]

There’s more going on here. Why does destroying a channel that was never connected take 30 seconds? I see that TROBaseSuperChannelWorker.Disconnect is waiting for DoExecute() to do something, but DoExecute is never called if the connection failed… Should fDisconnected not be true at this stage to facilitate a quick exit?

procedure TROBaseSuperChannelWorker.Disconnect;
begin
  if fDisconnected then Exit;   // if the connection failed, should this flag not be "true" ? 
  While... begin 
      .....// this loop takes 30 seconds
  end;
end;

Hi,

you can replace

service:=cosessionservice.Create('Supertcp://www.google.com:30841');

with

Channel.TargetUrl := 'Supertcp://www.google.com:30841';
service:=cosessionservice.Create(Message, Channel);

as a result, you will get The connection was finally destroyed... message w/o delay.

it takes 24 seconds if to be exact. this is timeout when channel waits for any data:

Result := fConnection.CanRead((PingFrequency * 10 div 25)* 1000);

PingFrequency is 60 so above line will be

Result := fConnection.CanRead(24000);

we already handle it as

procedure TROClientThread.IntExecute;
begin
  if fChannel.fOwner.InitialConnect then
    fChannel.DoExecute
  else
    fChannel.fExecuting := True;
end;

in your case, TROBaseSuperChannelWorker.Disconnect is called when fChannel.fOwner.InitialConnect isn’t finished yet …

you are right, it can be replaced with

  if not fExecuting then begin
    ROSwitchToThread;
    while not fExecuting do Sleep(10);
  end;

Hi EvgenyK,

you can replace

service:=cosessionservice.Create('Supertcp://www.google.com:30841');

with

Channel.TargetUrl := 'Supertcp://www.google.com:30841';
service:=cosessionservice.Create(Message, Channel);

as a result, you will get The connection was finally destroyed... message w/o delay.

That doesn’t fix the problem. The service is destroyed, yes, but not the connection. The connection is still busy for a very long time even though it already knows that it can’t connect.

This is my main problem:

We find that in our VPN infrastructure, tRoSuperTCPChannel sometimes is unable to reconnect if a server has gone offline temporarily. When the server comes back online, it can be PINGed and other services such as Remote Desktop work again but tRoSuperTCPChannel won’t reconnect, it keeps throwing eRoSuperChannelException (“no connection available”) when I execute a service. And sometimes, rarely, the service call hangs without ever returning.

If such a thing happens at night, we get angry calls from customers so I really had to find a workaround.

A workaround that works reliably so far is to destroy the channel whenever a server becomes unreachable and to create a fresh one as soon as the server can be PINGed again.

This is when I noticed two severe side effects. First of all, the channel’s destructor totally hogs the CPU (which you just fixed) and secondly it takes an eternity to finish, blocking the entire thread.

So I had to resort to another really desperate workaround. If a channel must be destroyed, I hand it over to a dedicated “kill” thread whose only job it is to destroy the channel. Yes that works, but it is a measure born out of desperation. I soooo wish the destructor would simply be a good boy and terminate whatever it’s doing immediately.

Hi,

try to use _AsyncEx service methods, they work more smoothly.
in your case, it will be

...
var service:iSessionService_AsyncEx;
begin
   service:=cosessionservice_AsyncEx.Create('Supertcp://www.google.com:30841');
   try
      service.BeginServiceversion(
             procedure(const aRequest: IROAsyncRequest) begin
               i := service.EndServiceversion(aRequest);
             end);
   Except

Note: Drop email to support@, I’ll send to you updated uROTransportChannel.pas for this scenario.