I have the problem that when a Remoting SDK server is offline, or when it refuses a connection, the CPU load on the client spikes unacceptably when the channel is destroyed. It maxes out one thread/core for 30 seconds! This is with version 10.0.0.1495.
This problem has been bugging me for ages.
I have a reproducible test case here. It can be reproduced with any trivial service (mine is called sessionservice). I would be utterly grateful for a fix.
Kind regards,
Arthur Hoornweg.
[edit]: More concise example
procedure TForm1.Test;
var service:iSessionService; i:integer;
begin
service:=cosessionservice.Create('Supertcp://www.google.com:30841');
try
i:=service.Serviceversion;
Except
screen.cursor:=crDefault;
Showmessage('Please start the task manager and go to the CPU Performance page. '+
' Press OK in this message box and look what happens to the CPU load.') ;
screen.cursor:=crHourglass;
end;
end;
procedure TForm1.Button1Click(Sender: TObject);
begin
screen.cursor:=crHourglass;
test;
screen.cursor:=crDefault;
Showmessage('The connection was finally destroyed...');
end;
According to SamplingProfiler, the CPU hog is “TROBaseSuperChannelWorker.Disconnect” , more specifically the line that does “tThread.Yield”. Replacing this line with “Sleep(1)” makes the CPU load go away but it still takes 30 seconds to destroy the channel.
[edit]
There’s more going on here. Why does destroying a channel that was never connected take 30 seconds? I see that TROBaseSuperChannelWorker.Disconnect is waiting for DoExecute() to do something, but DoExecute is never called if the connection failed… Should fDisconnected not be true at this stage to facilitate a quick exit?
procedure TROBaseSuperChannelWorker.Disconnect;
begin
if fDisconnected then Exit; // if the connection failed, should this flag not be "true" ?
While... begin
.....// this loop takes 30 seconds
end;
end;
as a result, you will get The connection was finally destroyed... message w/o delay.
That doesn’t fix the problem. The service is destroyed, yes, but not the connection. The connection is still busy for a very long time even though it already knows that it can’t connect.
This is my main problem:
We find that in our VPN infrastructure, tRoSuperTCPChannel sometimes is unable to reconnect if a server has gone offline temporarily. When the server comes back online, it can be PINGed and other services such as Remote Desktop work again but tRoSuperTCPChannel won’t reconnect, it keeps throwing eRoSuperChannelException (“no connection available”) when I execute a service. And sometimes, rarely, the service call hangs without ever returning.
If such a thing happens at night, we get angry calls from customers so I really had to find a workaround.
A workaround that works reliably so far is to destroy the channel whenever a server becomes unreachable and to create a fresh one as soon as the server can be PINGed again.
This is when I noticed two severe side effects. First of all, the channel’s destructor totally hogs the CPU (which you just fixed) and secondly it takes an eternity to finish, blocking the entire thread.
So I had to resort to another really desperate workaround. If a channel must be destroyed, I hand it over to a dedicated “kill” thread whose only job it is to destroy the channel. Yes that works, but it is a measure born out of desperation. I soooo wish the destructor would simply be a good boy and terminate whatever it’s doing immediately.
try to use _AsyncEx service methods, they work more smoothly.
in your case, it will be
...
var service:iSessionService_AsyncEx;
begin
service:=cosessionservice_AsyncEx.Create('Supertcp://www.google.com:30841');
try
service.BeginServiceversion(
procedure(const aRequest: IROAsyncRequest) begin
i := service.EndServiceversion(aRequest);
end);
Except
Note: Drop email to support@, I’ll send to you updated uROTransportChannel.pas for this scenario.
Hi EvgenyK, ROSwitchToThread does not exist, I assume you mean SwitchToThread?
Anyway, I believe I may have a workaround here.
It appears that setting “active:=false” in the OnException event handler of the channel speeds up subsequent destruction a lot.
procedure TForm1.OnChannelException(Sender: TROTransportChannel; anException: Exception; var aRetry: Boolean);
begin
IF (anException is eROTimeout) or (anException is eroSuperChannelException) then
begin
aretry:=False;
(sender as troBasesupertcpchannel).active:=False;
end;
end;
I think, this fix don’t harm to current logic and can handle aRetry:
procedure DoException(anException: Exception; var aRetry: Boolean); override;
...
procedure TROBaseSuperChannel.DoException(anException: Exception;
var aRetry: Boolean);
begin
inherited;
if not aRetry then Active := False;
end;
Hi EvgenyK, I see in my IDE that TROBaseSuperChannel.DoException is often called from the thread context of fWorkerThread. Therefore it is not safe to call “Active:=False” here because method SetActive() is not threadsafe.
A race condition / potential access violation would occur if my main thread simultaneously sets Active to false or when it calls the destructor.
I strongly suggest wrapping TROBaseSuperChannel.SetActive in a critical section to make it threadsafe.