Weird deadlock issue inside data modules

tobygroves · June 9, 2015, 1:21pm

Following my previous mammoth thread on deadlocks, I’m now experiencing another similar issue although the cause appears to be slightly different.

Randomly, rarely and under heavy load, I’m still having instances of my server locking up. I can reproduce this very occasionally and the issue appears to be within Delphi’s own code rather than RO. That said, I’m wondering if something RO is doing is triggering the issue or whether you guys have any knowledge of experience of it.

The issue seems to be centred around GlobalNameSpace, which is an instance of TMultiReadExclusiveWriteSynchronizer. When the lockup occurs, any request to the server gets as far as creating a data module, at which point the GlobalNameSpace.BeginWrite call at the start of TDataModule.Create blocks and never returns.

After doing some searching, it appears there have been some major historical issues with TMultiReadExclusiveWriteSynchronizer but there’s no clear indication of whether this is still the case. I’ve seen reports of issues being found in Delphi 6 but then fixed in Delphi 7 but I’ve also seen other open and unresolved problem reports and even some mention of the whole class being deprecated, in which case why is it still being used internally by Delphi code?

Have you come across anything like this?

EvgenyK · June 9, 2015, 1:33pm

it was a reason why we deprecated TDABinDataStreamer. standard TReader/TWriter classes are used in it.
these classes use GlobalNameSpace’s lock and often server application was locked …

as a workaround, you can use TCriticalSection instead of TMultiReadExclusiveWriteSynchronizer

tobygroves · June 9, 2015, 1:37pm

When you say use critical sections instead, how do you mean exactly? Delphi’s TDataModule makes use of GlobalNameSpace which is an instance of this class and I can’t change that without modifying their source.

Is there somewhere else that it’s being used to which you’re referring? Sorry for the confusion but I’m still trying to wrap my head around this problem and I’m still not 100% sure what I’m looking at.

Thanks.

EvgenyK · June 9, 2015, 1:46pm

If you are using TMultiReadExclusiveWriteSynchronizer in your code, try to replace it with TCriticalSections …

ROD doesn’t use TMultiReadExclusiveWriteSynchronizer at all
DAD uses TMultiReadExclusiveWriteSynchronizer only for OData support. I think, in some conditions it might be a reason for problems.
can you confirm that problems happen only at using OData requests?

tobygroves · June 9, 2015, 1:50pm

Ah ok. I’m not using it at all in my own code. Not entirely sure what the ROD/DAD/OData names are in reference to (again sorry for my ignorance) but I think the OData stuff is for web/http type access? If so then no, the problem occurs even when using pure binary transfers via TCP (using the synapse channels if that’s of any importance)

tobygroves · June 9, 2015, 2:09pm

One issue I have noticed, and seen raised by others, is in the TDataModule destruction sequence.

TDataModule.BeforeDestruction calls GlobalNameSpace.BeginWrite, then Destroying and DoDestroy.
TDataModule.Destroy calls GlobalNameSpace.BeginWrite only if ComponentState doesn’t contain csDestroying. It then calls GlobalNameSpace.EndWrite.

If any code executing between BeforeDestruction and Destroy (such an override method or a FormDestroy handler) were to remove csDestroying from ComponentState somehow, this would result in BeginWrite being called twice but EndWrite only once. Not sure if this is likely or even possible within the RO code?

EvgenyK · June 9, 2015, 6:17pm

try to update data module like :

var
  ROServer: TROIndyHTTPServer;

procedure TDataModule1.DataModuleCreate(Sender: TObject);
begin
  ROServer := TROIndyHTTPServer.Create(nil);
  TROMessageDispatcher(ROServer.Dispatchers.Add).Message := ROMessage;
  ROServer.Active := True;
end;

initialization
finalization
  ROServer.Free;
end.

tobygroves · June 10, 2015, 7:51am

You mean add this to all of my data modules which inherit from TDataModule? What exactly will this code do? I’m not currently using the TROIndyHTTPServer component and the one I am using (TROIpHTTPServer) I’ve disabled for the time being whilst I try to track this issue down.

EvgenyK · June 10, 2015, 8:34am

I mean: extract a server component from datamodule form and destroy it in finalization section of datamodule.
in your case, it will be TROIpHTTPServer instead of TROIndyHTTPServer .
a few years ago, such usage helped to avoid lock at destroying a datamodule form.

tobygroves · June 10, 2015, 8:55am

So I need this in all of my data modules? Ok I’ll give it a try.

Right now I’m not sure what’s going on with my server. It appears to be crashing totally randomly when I subject it to my load test. Sometimes it’ll run for an hour, sometimes minutes, and it crashes somewhere different every time, although usually in a GetSysMem or FreeSysMem call suggesting memory is being randomly corrupted. I have no idea how I’m going to find this.

EvgenyK · June 10, 2015, 9:17am

you can try to use some ideas:

use full version of fastmm4 with adjusted FastMM4Options.inc. it can detect memory corruptions
destroy objects with FreeAndNil(object) instead of object.Free. it will raise AV if some code uses already destroyed objects. as a result, wrong code can be easily detected and fixed.

tobygroves · June 10, 2015, 10:19am

Yeah I’ve checked all my Free calls. I’ve tried using FastMM4 in full debug mode but the problem doesn’t seem to occur, suggesting it’s a timing issue with multiple threads and the massive slowdown caused by FastMM4 is preventing it occurring.

Some of these are bizarre - here’s an example of one stack trace:

:7751c42d KERNELBASE.RaiseException + 0x58
uDAMemDataset.TDAMemoryDataset.ClearFieldByFieldType($A47573,???)
uDAMemDataset.TDAMemoryDataset.ClearFieldByFieldType($2ADCA40,???)
uDAMemDataset.TDAMemoryDataset.ClearBin2Buffer($6F2FE10)
uDAMemDataset.TDAMemoryDataset.DuplicateBuffer($6F82A50,$6F82A70,False)
uDAMemDataset.TDAMemoryDataset.LocalBufferToDatasetBuffer($6F82A50,$6F82A70)
uDAMemDataset.TDAMemoryDataset.intDirectSearch(???,???,???,True,$6F82A70)
uDAMemDataset.TDAMemoryDataset.LocateRecord(‘ValueName’,‘URL’,[loCaseInsensitive],True,$6F82A70)
uDAMemDataset.TDAMemoryDataset.Locate(‘ValueName’,‘URL’,[loCaseInsensitive])
uDADataTable.TDADataTable.Locate(‘ValueName’,‘URL’,[loCaseInsensitive])
uDADataTable.TDADataTableRules.Locate(???,???,[loCaseInsensitive])
BaseSettings.TSettingsAccess.Data_Locate(???,???,[loCaseInsensitive])
BaseSettings.TSettingsAccess.ReadString(‘URL’,‘’,‘’)
WebServerSettings.TWebServerSettings.TGeneral.GetURL
SyncUtils.TSyncUtils.TGoSyncRequestThread.Execute
System.Classes.ThreadProc($6FB84F8)
System.ThreadWrapper($6F86F10)
:7714338a kernel32.BaseThreadInitThunk + 0x12
:77b59f72 ntdll.RtlInitializeExceptionChain + 0x63
:77b59f45 ntdll.RtlInitializeExceptionChain + 0x36

EvgenyK · June 10, 2015, 10:29am

looks like, this problem is something related to background thread. try to clone datatable and use cloned instance inside thread. it should solve issue above.

tobygroves · June 10, 2015, 1:49pm

Yeah think I’ve found it - had a rare situation where two threads could access the same datatable simultaneously and was causing all manner of mayhem.

Going back to the locking issue - if I’m following you correctly you’re saying I should remove the TROIpHTTPServer from the main datamodule and instead create it at runtime, then destroy it in finalization, presumably to stop it being automatically destroyed as part of the datamodule’s destruction process, is that correct?

This would be on the main fServerDataModule which is obviously only created and destroyed once. I can see how this could potentially avoid a hang when the application is closed down but the issue I’m seeing is that it hangs randomly during execution which this wouldn’t help I don’t think. It’s still possible that above bug is somehow triggering this issue but only time and further testing will prove or disprove that.

EvgenyK · June 10, 2015, 2:06pm

yep. One customer had similar problem with destruction of server’s data module that contained the server, several years ago and this helped him.
unfortunately, I don’t remember details about that problem and what exactly message and channel were used

tobygroves · June 10, 2015, 2:10pm

Possibly related to this? - http://stackoverflow.com/questions/2655481/weird-call-stack-when-application-has-frozen

EvgenyK · June 10, 2015, 2:24pm

not sure.
We’ve added this solution in faq in March 2011, i.e. a year later than that thread.

luizhclazzer · June 9, 2017, 3:06pm

I have a similar problem with SuperTCPServer. When my server have few conected clients, all work’s fine. When I triggered a stress test, conecting and unconecting a many clients in a short time, my server “frozen” and no client can use the service. The time that problem occurs is randomly, sometimes is one or two minutes and sometimes is ten or fifteen minutes.
After the server “frozen”, debugging a new connection I arrive until the TDataSource.Create event, locking application in “GlobalNameSpace.BeginWrite”. Someone found how is the problem in project datasources that causes this situation?

EvgenyK · June 9, 2017, 3:27pm

GlobalNameSpace.BeginWrite means that it was stopped at creating service:

constructor TDataModule.Create(AOwner: TComponent);
begin
  GlobalNameSpace.BeginWrite;

looks like standard TMultiReadExclusiveWriteSynchronizer isn’t very good and causes some problems under stress.
you can try to use lightweight version of service: choose TRORemotable as ancestor for your service.
it is suitable for some cases.

luizhclazzer · June 9, 2017, 5:46pm

The service that I realized that’s locked by TMultiReadExclusiveWriteSynchronizer call’s CreateSession and DestroySession event’s, so I don’t known if I can replace TRORemoteDataModule by TRORemotable.

function TLoginService.Login(cUser, cPass: string): Boolean;
begin
  try
    if (TDataServer.UserName = cUser) and (TDataServer.Password = cPass) then
      begin
      CreateSession;
      ...
      Result := True;
    end
    else
      raise Exception.Create('Bad User');
  except
    on E:Exception do
    begin
      raise;
      Result := False;
      DestroySession;
    end;
  end;
end;