Error messages under heavy load

delphi

(estebanp) #1

Hello there,

Delphi Server, Latest Tokyo, latest RO.

I keep getting some error messages on our services under heavy load. Normally they will work accordingly but when we have few dozen requests per second in 2 or 3 hours periods we start getting error messages like:

TryGetGameHandlerByGameNum RemObjects.SDK.Types.ServerException: An exception occurred on the server: Class factory for interface GetProviderGameListByGameNum not found
at RemObjects.SDK.Message.ProcessException()
at RemObjects.SDK.BinMessage.InternalReadFromStream(Stream stream)
at RemObjects.SDK.Message.ReadFromStream(Stream stream)
at RemObjects.SDK.IpSuperTcpClientChannel.IntDispatch(Stream request, IMessage response)

Or

TryGetGameHandlerByGameNum RemObjects.SDK.Types.ServerException: An exception occurred on the server: Unknown method Get\x00GetProviderGameListByGam for interface ProviderGame
at RemObjects.SDK.Message.ProcessException()
at RemObjects.SDK.BinMessage.InternalReadFromStream(Stream stream)
at RemObjects.SDK.Message.ReadFromStream(Stream stream)
at RemObjects.SDK.IpSuperTcpClientChannel.IntDispatch(Stream request, IMessage response)

Or

TryGetGameHandlerByGameNum RemObjects.SDK.Types.ServerException: An exception occurred on the server: Error reading parameter aGameNum: Stream read error
at RemObjects.SDK.Message.ProcessException()
at RemObjects.SDK.BinMessage.InternalReadFromStream(Stream stream)
at RemObjects.SDK.Message.ReadFromStream(Stream stream)
at RemObjects.SDK.IpSuperTcpClientChannel.IntDispatch(Stream request, IMessage response)

Or

TryGetGameHandlerByGameNum System.NullReferenceException: Object reference not set to an instance of an object.
at RemObjects.SDK.BinMessage.ReadException()
at RemObjects.SDK.Message.ProcessException()
at RemObjects.SDK.BinMessage.InternalReadFromStream(Stream stream)
at RemObjects.SDK.Message.ReadFromStream(Stream stream)

Or

TryGetGameHandlerByGameNum System.IndexOutOfRangeException: Index was outside the bounds of the array.
at System.Array.Clear(Array array, Int32 index, Int32 length)
at System.IO.MemoryStream.Write(Byte[] buffer, Int32 offset, Int32 count)
at RemObjects.SDK.BinSerializer.WriteInt32(Int32 value)
at RemObjects.SDK.BinSerializer.WriteInt32(String name, Int32 value)

Or

TryGetGameHandlerByGameNum RemObjects.SDK.Types.ServerException: An exception occurred on the server: No mapping for the Unicode character exists in the target multi-byte code page
at RemObjects.SDK.Message.ProcessException()
at RemObjects.SDK.BinMessage.InternalReadFromStream(Stream stream)
at RemObjects.SDK.Message.ReadFromStream(Stream stream)
at RemObjects.SDK.IpSuperTcpClientChannel.IntDispatch(Stream request, IMessage response)

Now this is the same method, on the same service, same machine with 6 different error messages. It usually works without problems but starts spilling those errors under the conditions specified above.

This method basically receives an integer and returns a list of objects associated with it. The list usually contains one item, but can go up to 3.

Clients can be .Net and Delphi clients.

It points out to RO not being able to handle that many requests but I’ll expect a timeout error and not being able to reply back to the client, but not these errors. Could we have some sort of race condition on the “dispatcher” code of RO SDK that is not thread safe and corrupts the output? I mean some of these requests are happening on the same millisecond.

Any ideas?


(EvgenyK) #2

Hi,

You are tested Delphi server <>.NET client .
Can you show error messages raised between Delphi server and Delphi client, pls?


Edit: What server type (Indy plain http, Synapse plain http, Indy SuperTcp, etc) you are using?


(estebanp) #3

Hi Evgeny,

Synapse SuperTcp, binary. Server side we see no error messages, so probably they are happening on an area of the code where we have no control to try/catch them and we dont have a global exception catch.

I can not get you Delphi Client error messages right now, the .Net clients are the ones that we have live monitoring in place because thats where our sites and feeds are setup on. Desktop Rich clients take longer because logging is not centralized. In any case, thoughts? crazy ideas I can look at?


(EvgenyK) #4

Hi,

Try to replace Synapse SuperTCP with Indy SuperTCP.
it can work better under heavy load


(estebanp) #5

Hola!

Well finally we were able to move our synapse clients to indy, because of this issue and also because of other problems we were suffering with synapse when dealing with event sinks and re-connections.

The sad part is that yesterday on our first full busy day of “trading” after the update, the server started to throw exceptions. It only happens under heavy load, we increased our queue/thread settings and it is running on a good machines (8 cores each )(we have two servers on round robing configuration for this service). It just seems like some part of the code is not thread safe and wheels are coming of when it gets though. One interesting point is that it fails on the same method with this signature:

TProviderGameList GetProviderGameListByGam(in aGameNum:Integer);

It is a heavily used method (thousands of calls per minute), errors seemed to happen on serialization/deserialization of the data based on the error messages. It is a list that its being returned usually with 2/3 elements in average, small object with 5 fields. On the server side the implementation calls are completed but once it leaves the actual implementation and goes back to RO something goes wrong. What, no idea, any ideas are welcome. Bellow a sample of the error messages we are getting.

System.IndexOutOfRangeException: Index was outside the bounds of the array.
at System.Array.Clear(Array array, Int32 index, Int32 length)
at System.IO.MemoryStream.Write(Byte[] buffer, Int32 offset, Int32 count)
at RemObjects.SDK.BinSerializer.WriteInt32(Int32 value)
at RemObjects.SDK.BinSerializer.WriteInt32(String name, Int32 value)
at RemObjects.SDK.Message.WriteInt32(String name, Int32 value)

or

System.NullReferenceException: Object reference not set to an instance of an object.
at RemObjects.SDK.BinMessage.ReadException()
at RemObjects.SDK.Message.ProcessException()
at RemObjects.SDK.BinMessage.InternalReadFromStream(Stream stream)
at RemObjects.SDK.Message.ReadFromStream(Stream stream)
at RemObjects.SDK.IpSuperTcpClientChannel.IntDispatch(Stream request, IMessage response)
at RemObjects.SDK.ClientChannel.Dispatch(IMessage message)

or

RemObjects.SDK.Types.ServerException: An exception occurred on the server: Stream read error: Invalid string length “951962980”
at RemObjects.SDK.Message.ProcessException()
at RemObjects.SDK.BinMessage.InternalReadFromStream(Stream stream)
at RemObjects.SDK.Message.ReadFromStream(Stream stream)
at RemObjects.SDK.IpSuperTcpClientChannel.IntDispatch(Stream request, IMessage response)
at RemObjects.SDK.ClientChannel.Dispatch(IMessage message)

or

System.Exception: BinMessage: Unexpected end of stream.
at RemObjects.SDK.BinSerializer.ReadInt32()
at RemObjects.SDK.BinSerializer.ReadUtf8String()
at RemObjects.SDK.BinSerializer.ReadUtf8String(String name)
at RemObjects.SDK.BinMessage.ReadException()
at RemObjects.SDK.Message.ProcessException()
at RemObjects.SDK.BinMessage.InternalReadFromStream(Stream stream)
at RemObjects.SDK.Message.ReadFromStream(Stream stream)
at RemObjects.SDK.IpSuperTcpClientChannel.IntDispatch(Stream request, IMessage response)
at RemObjects.SDK.ClientChannel.Dispatch(IMessage message)

or

RemObjects.SDK.Types.ServerException: An exception occurred on the server: No mapping for the Unicode character exists in the target multi-byte code page
at RemObjects.SDK.Message.ProcessException()
at RemObjects.SDK.BinMessage.InternalReadFromStream(Stream stream)
at RemObjects.SDK.Message.ReadFromStream(Stream stream)
at RemObjects.SDK.IpSuperTcpClientChannel.IntDispatch(Stream request, IMessage response)
at RemObjects.SDK.ClientChannel.Dispatch(IMessage message)


(antonk) #6

Hello

Let’s start with the .NET client part.

The exception messages seem somewhat odd (at least one of them does happen when the data is being written into the stream, not being read form it)

So the first question is do these short bursts of requests come from the same client or from several clients at once?

The second question is which exactly RO SDK version do you use?

Are your client-side calls sync or async ones?

Could you show that TProviderGame List structure and, if possible, provide _Intf files for both server and client sides (send them to support@ to avoid unnecessary publicity)

As for the diagnosis, let’s start with the simplest thing here:

In your _Intf file (client-side) find the GetProviderGameListByGam method implementation and put its entire code into lock statement like

lock(this)
{
// entire old method code is here
}

and then run the app.

This will give us some understanding where the thing go wrong - client-side or server-side.

Regards


(estebanp) #7

Hello Anton,

Thank you for your time.

I’ll try to get answer to all your questions on the next days. We are in the middle of some major deployments so its complicated.

So the first question is do these short bursts of requests come from the same client or from several clients at once?

We have two major services (in .Net) in a round robin load balancer sending thousands of requests to a Delphi service. So yes, a lot of them can come from a single or maximum two clients at once. We are talking about milliseconds difference on the time of requests they are receiving.

The second question is which exactly RO SDK version do you use?

I’m using version 9.4.109.1377. Which is latest - 1.

Are your client-side calls sync or async ones?
.Net Async for this call in specifc.

Server side is built on Delphi. Clients are .Net.

Could you show that TProviderGame List structure and, if possible, provide _Intf files for both server and client sides (send them to support@ to avoid unnecessary publicity)

I already sent the _Intf files to support.

In your _Intf file (client-side) find the `GetProviderGameListByGam` method implementation and put its entire code into `lock` statement like

That will take sometime, it requires some “red tape” to release to our production servers.

Thank you.