Olympia server logging?

Hello

We were running internal tests + some DB structure modifications.
As for the Delphi side of things - please make sure that your SuperTCP channel used to connect to the Olympia Server instance has its AutoReconnect property set to true. Our Delphi Chat sample was able to reconnect to Olympia after its restart (given the AutoReconnect property was set to true )

Also please note that there is one additional index in the db that needs to be created (for MS SQL database):

CREATE INDEX [IX_EVENT_ID] ON [EVENTUSERDATA] (
	[EVENTID] ASC
);

This index + some queries now running with WITH (NOLOCK) should resolve the deadlock issue

Also PK error on event subscriptions management should also be gone. It now uses client-tied lock to ensure that one and only one subscriptions update operation can be run for any given client at any given moment of time.

Regards

1 Like

They all do, BUT, I do alter the option to Retry to false on the OnException event for the message broker (Olympia).

It usually ends on a never ending loop. Does this affects the channel internals? I’m assuming Retry will be to retry the operation being done, not the internal attempts to reconnect, which should be managed separately based on timeout, retryintervals, etc settings.

Is this assumption correct?

If it is not, on the next request that a service connected to Olympia receives it will attempt to send a message to Olympia and re-activate things isnt?

Hi,

can you specify what exactly build you are using - custom one or stable/preview?
I’ve tested the HTTP chat sample and it works w/o any issues with Olympia and OnException event:

procedure THTTPChatServerMainForm.OlympiaChannelException(
  Sender: TROTransportChannel; anException: Exception; var aRetry: Boolean);
begin
  aRetry := True;
end;

Hi Evgeny,

I’m using version 10.0.0.1463, in general, but starting to test the new Olympia (Anton passed me the new version) but just Olympia, still using the .1463 build connecting to it.

The question is if the recommended value is aRetry := True. Because if there is an issue with Olympia, it will continue rejecting calls and the service using it will look just keep retrying the same call over and over. I turned it off because if Olympia fails, it will reject the call and throw an exception right away, knowing for certain that every call being passed to the service using Olympia will trigger an Olympia check (to update latest activity) and a new attempt will be done .

Basically, I’m expecting Olympia to be unstable and never recover. Do I leave it on aRetry := True?

Hi,

You can calculate count of failures and set False if it reached some specific number (10/15/20/etc) in consecutive calls.
if connect was successful, you can set that variable to zero

Thank you Evgeny,

Originally something like that was done, but because we have 30+ client services connecting to the service that has the Olympia connection, it was not doing much.

I’ll put it back to see if it makes any difference with the new version and implement an exponential back off retry which probably will serve us better.

Hi guys,

Delphi clients, Rio 10.3.2. Using experimental Olympia. (10.0.0.1474).

This is a sample of our logs showing disconnections from Olympia (MessageBroker).

23-04-2020 11:18:11.575 [DEBUG] [ThreadId: 3548] [PID: 1404] ->Enter. Remoting.Services.MessageBroker.TMessageBroker.OnChannelDisconnected
23-04-2020 11:18:11.575 [WARN] [ThreadId: 3548] [PID: 1404] MessageBroker disconnected.
23-04-2020 11:18:11.575 [DEBUG] [ThreadId: 3548] [PID: 1404] <-Leave. Remoting.Services.MessageBroker.TMessageBroker.OnChannelDisconnected
23-04-2020 11:18:12.090 [DEBUG] [ThreadId: 3548] [PID: 1404] ->Enter. Remoting.Services.MessageBroker.TMessageBroker.OnChannelConnected
23-04-2020 11:18:12.090 [INFO] [ThreadId: 3548] [PID: 1404] Reconnecting to MessageBroker.
23-04-2020 11:18:12.090 [INFO] [ThreadId: 3548] [PID: 1404] Connection and registration to MessageBroker established.
23-04-2020 11:18:12.090 [DEBUG] [ThreadId: 3548] [PID: 1404] <-Leave. Remoting.Services.MessageBroker.TMessageBroker.OnChannelConnected
23-04-2020 11:18:12.090 [DEBUG] [ThreadId: 3548] [PID: 1404] ->Enter. Remoting.Services.MessageBroker.TMessageBroker.OnChannelDisconnected
23-04-2020 11:18:12.090 [WARN] [ThreadId: 3548] [PID: 1404] MessageBroker disconnected.
23-04-2020 11:18:12.090 [DEBUG] [ThreadId: 3548] [PID: 1404] <-Leave. Remoting.Services.MessageBroker.TMessageBroker.OnChannelDisconnected
23-04-2020 11:18:12.610 [DEBUG] [ThreadId: 3548] [PID: 1404] ->Enter. Remoting.Services.MessageBroker.TMessageBroker.OnChannelConnected
23-04-2020 11:18:12.610 [INFO] [ThreadId: 3548] [PID: 1404] Reconnecting to MessageBroker.
23-04-2020 11:18:12.610 [INFO] [ThreadId: 3548] [PID: 1404] Connection and registration to MessageBroker established.
23-04-2020 11:18:12.610 [DEBUG] [ThreadId: 3548] [PID: 1404] <-Leave. Remoting.Services.MessageBroker.TMessageBroker.OnChannelConnected
23-04-2020 11:18:12.611 [DEBUG] [ThreadId: 3548] [PID: 1404] ->Enter. Remoting.Services.MessageBroker.TMessageBroker.OnChannelDisconnected
23-04-2020 11:18:12.611 [WARN] [ThreadId: 3548] [PID: 1404] MessageBroker disconnected.
23-04-2020 11:18:12.611 [DEBUG] [ThreadId: 3548] [PID: 1404] <-Leave. Remoting.Services.MessageBroker.TMessageBroker.OnChannelDisconnected
23-04-2020 11:18:13.124 [DEBUG] [ThreadId: 3548] [PID: 1404] ->Enter. Remoting.Services.MessageBroker.TMessageBroker.OnChannelConnected
23-04-2020 11:18:13.124 [INFO] [ThreadId: 3548] [PID: 1404] Reconnecting to MessageBroker.
23-04-2020 11:18:13.124 [INFO] [ThreadId: 3548] [PID: 1404] Connection and registration to MessageBroker established.
23-04-2020 11:18:13.124 [DEBUG] [ThreadId: 3548] [PID: 1404] <-Leave. Remoting.Services.MessageBroker.TMessageBroker.OnChannelConnected
23-04-2020 11:18:13.124 [DEBUG] [ThreadId: 3548] [PID: 1404] ->Enter. Remoting.Services.MessageBroker.TMessageBroker.OnChannelDisconnected
23-04-2020 11:18:13.124 [WARN] [ThreadId: 3548] [PID: 1404] MessageBroker disconnected.
23-04-2020 11:18:13.124 [DEBUG] [ThreadId: 3548] [PID: 1404] <-Leave. Remoting.Services.MessageBroker.TMessageBroker.OnChannelDisconnected
23-04-2020 11:18:13.638 [DEBUG] [ThreadId: 3548] [PID: 1404] ->Enter. Remoting.Services.MessageBroker.TMessageBroker.OnChannelConnected
23-04-2020 11:18:13.638 [INFO] [ThreadId: 3548] [PID: 1404] Reconnecting to MessageBroker.
23-04-2020 11:18:13.638 [INFO] [ThreadId: 3548] [PID: 1404] Connection and registration to MessageBroker established.
23-04-2020 11:18:13.638 [DEBUG] [ThreadId: 3548] [PID: 1404] <-Leave. Remoting.Services.MessageBroker.TMessageBroker.OnChannelConnected
23-04-2020 11:18:13.638 [DEBUG] [ThreadId: 3548] [PID: 1404] ->Enter. Remoting.Services.MessageBroker.TMessageBroker.OnChannelDisconnected
23-04-2020 11:18:13.638 [WARN] [ThreadId: 3548] [PID: 1404] MessageBroker disconnected.
23-04-2020 11:18:13.638 [DEBUG] [ThreadId: 3548] [PID: 1404] <-Leave. Remoting.Services.MessageBroker.TMessageBroker.OnChannelDisconnected
23-04-2020 11:18:14.150 [DEBUG] [ThreadId: 3548] [PID: 1404] ->Enter. Remoting.Services.MessageBroker.TMessageBroker.OnChannelConnected
23-04-2020 11:18:14.150 [INFO] [ThreadId: 3548] [PID: 1404] Reconnecting to MessageBroker.
23-04-2020 11:18:14.150 [INFO] [ThreadId: 3548] [PID: 1404] Connection and registration to MessageBroker established.
23-04-2020 11:18:14.150 [DEBUG] [ThreadId: 3548] [PID: 1404] <-Leave. Remoting.Services.MessageBroker.TMessageBroker.OnChannelConnected
23-04-2020 11:18:14.150 [DEBUG] [ThreadId: 3548] [PID: 1404] ->Enter. Remoting.Services.MessageBroker.TMessageBroker.OnChannelDisconnected
23-04-2020 11:18:14.150 [WARN] [ThreadId: 3548] [PID: 1404] MessageBroker disconnected.
23-04-2020 11:18:14.150 [DEBUG] [ThreadId: 3548] [PID: 1404] <-Leave. Remoting.Services.MessageBroker.TMessageBroker.OnChannelDisconnected

That’s our instrumentalization of every time the event for OnDisconnected from Olympia is executed from a client service connecting to it, notice the time difference. That’s ongoing, so far it just continues doing so and our services start failing until we restart Olympia. We use autoreconnect = true on Indy based channels.

Olympia memory usage is low and cpu usage is low so thats good news.

What will happen if we are sending hundreds or thousands of requests and Olympia is failing are we channeling / sending the same amount back to the main thread on the OnException event for Olympia Server component? any possibility of locks? How a retry policy will work there?

I mean if it is a retry counter, it will mean nothing, and if it is a counter with an exponential back off time it will block the main thread and things will go bad quite soon.

Back to this issue, it is simply disconnecting without any error being sent back, it just starts connecting/disconnecting, the services are running on azure, they talk to each other inside azure so downtime of parts of the infrastructure is less of an issue and I’m discarding it.

Hi,

is it possible to create some kind of testcase that reproduces this issue?
I’d happy to reproduce it on our side

I wish, but production conditions will be hard to reproduce. In the meanwhile, is it safe to assume that the OnException event on the Olympia component runs on the main thread?

If it is, what happens if multiple exceptions happened simultaneously or there is a queue of errors? how the retry will work? does it waits for response of the first one or it just queues up? is it a thread safe variable somewhere?

Hi,

it depends on code inside event. for example, if it’s some GUI interaction like MessageBox, ShowMEssage, etc, then no. if no GUI interaction, it can be considered as safe

by default, all requests are put to queue so they will be sent step by step.
if error is detected, onException will be raised and if Retry is set, broken package will be sent again

Hi guys,

Quick question, noticed this on one of our deadlocks with Olympia.

(@OLD_EVENTUSER nvarchar(36),@OLD_EVENTID nvarchar(36))DELETE
FROM
[EVENTUSERDATA]
WHERE
([EVENTUSER] = @OLD_EVENTUSER)
AND ([EVENTID] = @OLD_EVENTID)

the query defined (@OLD_EVENTUSER nvarchar(36),@OLD_EVENTID nvarchar(36)), but in the table, ([EVENTUSER]
AND ([EVENTID] is defined as char(36), using nvarchar in the where clause, will cause whole index scan which may be a cause for deadlocks.

Is it possible to query with (@OLD_EVENTUSER varchar(36),@OLD_EVENTID varchar(36)) ?

Hi guys,

I decided to update to using the latest available Olympia with the db updates suggested and our production environment is giving us deadlock alerts 3 to 4 times a day. Considering that we are in very low traffic days, this is just going to get worst. Could we check if we can do the suggestion presented above to alleviate things?

The query causing the deadlock is always the same. The one above.

Thanks, logged as bugs://84349

In process:

exec sp_executesql N'DELETE
FROM
  [EVENTUSERDATA]
WHERE
  ([EVENTUSER] = @OLD_EVENTUSER)
AND  ([EVENTID] = @OLD_EVENTID)
',N'@OLD_EVENTUSER varchar(36),@OLD_EVENTID varchar(36)',@OLD_EVENTUSER='13b4837a-c294-4367-bd08-9c7d247668f6',@OLD_EVENTID='03209ec5-bd56-4b03-9261-6040cc1780a7'

Do you have a testing environment where this issue can be reproduced without affecting production servers?

Hi,

I will check on our Dev servers. What version of roolympiaserver. Exe are you running? , I’m also going to trace the execution in Dev, production has too much traffic to attempt isolation.

We are getting the lock every 4 hours or so. :frowning: ill verify what you post is what it’s being executed.

That’s from my development environment. Just showing that I have a solution for this particular issue.

I’ll put it up in dev and in production. Right now, I’ll take whatever you have cause dba’s are getting a bit jumpy with me.

That said, in case of a deadlock there are 2 transactions/queries involved. Is it possible to catch the second transaction 's query too?

Hope that helps.

Hi there,

Today we had 5 deadlocks so far, all under the same circumstances as the sceenshot posted above. Do you guys need anything else?