.NET RO server hanging if it is running over night

birger · November 13, 2019, 9:32am

Hi there,

Just wanted to let you know that we also experience our .NET RO server hanging if it is running over night.
It works fine on the day we run it. If we then continue calling it from client apps, it just responds with 504 Gateway Closed.
If I go to the /BIN URL it works fine.
When I close it, it freezes and has to be closed in task manager.
We do not have any servers in production yet.
Using Water 10.0.0.2451 + RODA 10.0.0.1449

antonk · November 13, 2019, 9:52am

Hello

What exactly server channel do you use? How much requests do hit this server during the day?
Is it possible to attach a debugger to check where exactly execution is paused when you try to close it?

Btw error code 504 is not raised by any of the RO SDK server channels. Do you have something like reverse proxy installed over the server host?

birger · November 13, 2019, 10:09am

Thanks.
That is also strange error 504, I need to check it again tomorrow.
I use the standard channel:

class method Program.Main(args: array of String): Integer;
begin
  var fSDM := new HDFMK.Data.SingletonDataModule;

  var server := new ApplicationServer(HDFMK.Gateway.Properties.Settings.Default.ApplicationName);
  server.AutoCreateSelfSignedCertificate := false;
  server.NetworkServer.UseTLS := false;
  server.NetworkServer.Port := HDFMK.Gateway.Properties.Settings.Default.ServerPort;
  server.Run(args)
end;

Inside the fSDM object I create a ConnectionManager.

Strange thing is the http://server:8099/BIN is working fine and displays the RODL etc. That should be a GET call.
I guess the ROClient>>ROServer call is a POST http verb call.

I will need to debug inside the service call, because the ROServer is a proxy as well, that calls another WSDL service (not RO).
So, I will need to make sure where the app is hanging.

antonk · November 13, 2019, 10:50am

This is the plain Http server channel. It should be quite reliable

In this server channel all request go via the same pipeline, so either the browser caches the RODL resource or the channel is not actually dead.

Tho check if the channel is here just go to http://server:8099/..some random string here... to bypass the browser cache.

If the server responds with a page saying that dispatcher is not found then the server itself is working.

The next step is to add a code like this

	protected override void InternalActivate(Guid clientId)
	{
		base.InternalActivate(clientId);
		// Add here some logging
	}

to your service class. This method is executed when the service instance is being activated. This is the easiest way to check if the issue is related to service activation or there is something wrong with the service method code.

EDIT: Service methods are executed in separate threads. If for some reason the service method hangs for a loooong period of time (like forever, f.e. trying to reach a remote server in an endless try…except loop) then the server app itself will hang on close. It waits for the worker threads to exit gracefully, yet unfortunately if a worker thread hangs due to the service method then server app will fail to close properly.

birger · November 18, 2019, 7:54pm

It seems like the ConnectionManager does not release the connections in the connection pool?
And they are not freed by ConnectionManager after one hour.
Here is an image showing connections in FireBird:

This is how the ConnectionManager is created:

fConnectionManager := new ConnectionManager(
  MaxPoolSize := 10,
  //PoolTimeoutSeconds := 3600, //obsolete
  ConnectionPoolTimeout := 3600,
  PoolTransactionBehaviour := PoolTransactionBehaviour.Rollback,
  PoolingEnabled := true,
  PoolingBehavior := RemObjects.SDK.Pooling.PoolBehavior.RaiseError
  );

birger · November 18, 2019, 7:57pm

This is the service:

type

  [Service, PooledClassFactory(5, PoolBehavior.CreateAdditional)]
  DataService = public class(Service) //RemObjects.DataAbstract.Server.DataAbstractService)
  private
    m: Main := new Main;
  public
    [ServiceMethod] method DoSomething(aSomeValue: String): String;
    [ServiceMethod] method Login(IsTest: Boolean; LoginBlock: String): String;
    [ServiceMethod] method Action(ActionType: String; SessionId: String; Param2, Param3, Param4, LoginAssertion, Param5: String): String;

    [ServiceMethod] method PauseWithDate(
      SessionId: String; MedicineCardVersion: Int64; BorgerCPRnr: String; DrugMedicationIdentifier: Int64; 
      StartDate, EndDate: DateTime): String;

    constructor;
  end;

antonk · November 18, 2019, 8:59pm

Connections can stay alive up to 1.5x of the ConnectionPoolTimeout seconds.

To check connection timeouts DA has to pause the ConnectionManager instance, so this operation can be quite costly to perform it, say, every 1 minute.

birger · November 18, 2019, 9:26pm

So I could add this:
WaitIntervalSeconds := 600
and check every 10 minutes.

I am trying to force it to release the connection after each service call.
So that it does not keep a hold on the connection and never releases it.
I will try it out and get back.

birger · November 18, 2019, 10:15pm

It seems like it never releases the database connection. It used to work in Visual Studio 2015.
I am using Water latest release now, as well as DA latest release.

Even if I use using := to Dispose it still keeps the connection. So when I reach 10 connections it fails to respond. And times out.

So I believe the reason maybe that it worked before, is because as long as the service is in active use, and the PooledClassFactory does not release the service object, it keeps working with the connections that the service instance already has.
When the pooled service instance times out, and needs to return to the PooledClassFactory, the database connection would hang.
Each night we restart the RO server, so the problem never got discovered.
Now, when we only test (not many testers), the pooled service, times out and hangs.

I would think that if the object “holding” the connection is Disposed (DataAbstractService) it should release the database connection? This does not seem to happen (in my case - for some reason).

antonk · November 19, 2019, 9:07pm

No. This is a completely different option that controls how long should the pool wait before telling client that there is no available instances in the pool (if the Wait option is set)

Btw next build will have smarter connection check timer period management:

It will take the expected connection expiration timeout and try perform the check twice as often, but not more often than once per 60 seconds and not less often than once per 600 seconds.

PooledClassFactory destroys timed out service instances. What happens with the db connection in this case depends on the concrete ADO.NET driver implementation.

It doesn’t do this. This service expects itself to be properly De-activated. Otherwise the connection is not returned to the pool, session is not properly released etc.

Connection Pool doesn’t know that the connection taken from it has been closed and destroyed already, so it thinks that someone still uses it. Eventually it ends up that the pool thinks that there is no more connections it does what it is configured for - waits for 10 minutes before raising an error that no connections can be acquired.

What you REALLY need to do is to use the Data Abstract instance properly and let the ConnectionManager do its work.

You need to override the InternalActivate and InternalDeactivate protected service methods and create new DataAbstractService instance + activate it in the former and deactivate service instance in the latter.

Both operations are really fast and are literally nothing compared to the time required to actually perform the database call, so there won’t be any performance hit

birger · November 19, 2019, 10:51pm

Ok. I will just remove that Wait option again, it is configured to RaiseError, if there are no more connections in the pool.

We use FireBird ADO.NET driver version 7.1.1.0.

Some time ago we discovered, that if the Firebird connections were set to timeout after 10 minutes, the Firebird server would pause for about 10 minutes, before new connections could be generated.
So if our customers had lunch, and all of the connections in the pool timed out, when they returned they had to wait for 10 minutes before they could work again.
So therefore we increased the timeout to one hour, and this solved the problem.

I was thinking about “refactoring” our custom RO server, and avoid using a “nested” DataabstractService inside the pooled RemObjects.Service.
As you write, use a DataAbstractService directly and let it handle the connections in the standard way.

Do you have a specific sample of what you write above?

Thanks.

antonk · November 20, 2019, 10:56am

Could you provide a sample how exactly you do use that DataAbstract service instance? I assume you use it to retrieve data used by the service method?

birger · November 21, 2019, 4:19pm

I believe I may have solved the issue, by rewriting the code that handles the activation and deactivation.
Now the internal DataAbstractService instance is deactivated properly, when the service has completed the request.
The problem was that Activation/Deactivation was called only by the constructor and on dispose.
So the service would never return the connection to the pool, and on timeout it was not deactivated properly either.
So, just testing now, and hopefully it has been solved.
Thanks for the comments.