Olympia server logging?

antonk · April 15, 2020, 3:32pm

Hello

We were running internal tests + some DB structure modifications.
As for the Delphi side of things - please make sure that your SuperTCP channel used to connect to the Olympia Server instance has its AutoReconnect property set to true. Our Delphi Chat sample was able to reconnect to Olympia after its restart (given the AutoReconnect property was set to true )

Also please note that there is one additional index in the db that needs to be created (for MS SQL database):

CREATE INDEX [IX_EVENT_ID] ON [EVENTUSERDATA] (
	[EVENTID] ASC
);

This index + some queries now running with WITH (NOLOCK) should resolve the deadlock issue

Also PK error on event subscriptions management should also be gone. It now uses client-tied lock to ensure that one and only one subscriptions update operation can be run for any given client at any given moment of time.

Regards

estebanp · April 15, 2020, 3:41pm

They all do, BUT, I do alter the option to Retry to false on the OnException event for the message broker (Olympia).

It usually ends on a never ending loop. Does this affects the channel internals? I’m assuming Retry will be to retry the operation being done, not the internal attempts to reconnect, which should be managed separately based on timeout, retryintervals, etc settings.

Is this assumption correct?

If it is not, on the next request that a service connected to Olympia receives it will attempt to send a message to Olympia and re-activate things isnt?

EvgenyK · April 16, 2020, 10:05am

Hi,

can you specify what exactly build you are using - custom one or stable/preview?
I’ve tested the HTTP chat sample and it works w/o any issues with Olympia and OnException event:

procedure THTTPChatServerMainForm.OlympiaChannelException(
  Sender: TROTransportChannel; anException: Exception; var aRetry: Boolean);
begin
  aRetry := True;
end;

estebanp · April 16, 2020, 2:29pm

Hi Evgeny,

I’m using version 10.0.0.1463, in general, but starting to test the new Olympia (Anton passed me the new version) but just Olympia, still using the .1463 build connecting to it.

The question is if the recommended value is aRetry := True. Because if there is an issue with Olympia, it will continue rejecting calls and the service using it will look just keep retrying the same call over and over. I turned it off because if Olympia fails, it will reject the call and throw an exception right away, knowing for certain that every call being passed to the service using Olympia will trigger an Olympia check (to update latest activity) and a new attempt will be done .

Basically, I’m expecting Olympia to be unstable and never recover. Do I leave it on aRetry := True?

EvgenyK · April 17, 2020, 9:39am

Hi,

You can calculate count of failures and set False if it reached some specific number (10/15/20/etc) in consecutive calls.
if connect was successful, you can set that variable to zero

estebanp · April 17, 2020, 2:38pm

Thank you Evgeny,

Originally something like that was done, but because we have 30+ client services connecting to the service that has the Olympia connection, it was not doing much.

I’ll put it back to see if it makes any difference with the new version and implement an exponential back off retry which probably will serve us better.

estebanp · April 24, 2020, 8:12pm

Hi guys,

Delphi clients, Rio 10.3.2. Using experimental Olympia. (10.0.0.1474).

This is a sample of our logs showing disconnections from Olympia (MessageBroker).

23-04-2020 11:18:11.575 [DEBUG] [ThreadId: 3548] [PID: 1404] ->Enter. Remoting.Services.MessageBroker.TMessageBroker.OnChannelDisconnected
23-04-2020 11:18:11.575 [WARN] [ThreadId: 3548] [PID: 1404] MessageBroker disconnected.
23-04-2020 11:18:11.575 [DEBUG] [ThreadId: 3548] [PID: 1404] <-Leave. Remoting.Services.MessageBroker.TMessageBroker.OnChannelDisconnected
23-04-2020 11:18:12.090 [DEBUG] [ThreadId: 3548] [PID: 1404] ->Enter. Remoting.Services.MessageBroker.TMessageBroker.OnChannelConnected
23-04-2020 11:18:12.090 [INFO] [ThreadId: 3548] [PID: 1404] Reconnecting to MessageBroker.
23-04-2020 11:18:12.090 [INFO] [ThreadId: 3548] [PID: 1404] Connection and registration to MessageBroker established.
23-04-2020 11:18:12.090 [DEBUG] [ThreadId: 3548] [PID: 1404] <-Leave. Remoting.Services.MessageBroker.TMessageBroker.OnChannelConnected
23-04-2020 11:18:12.090 [DEBUG] [ThreadId: 3548] [PID: 1404] ->Enter. Remoting.Services.MessageBroker.TMessageBroker.OnChannelDisconnected
23-04-2020 11:18:12.090 [WARN] [ThreadId: 3548] [PID: 1404] MessageBroker disconnected.
23-04-2020 11:18:12.090 [DEBUG] [ThreadId: 3548] [PID: 1404] <-Leave. Remoting.Services.MessageBroker.TMessageBroker.OnChannelDisconnected
23-04-2020 11:18:12.610 [DEBUG] [ThreadId: 3548] [PID: 1404] ->Enter. Remoting.Services.MessageBroker.TMessageBroker.OnChannelConnected
23-04-2020 11:18:12.610 [INFO] [ThreadId: 3548] [PID: 1404] Reconnecting to MessageBroker.
23-04-2020 11:18:12.610 [INFO] [ThreadId: 3548] [PID: 1404] Connection and registration to MessageBroker established.
23-04-2020 11:18:12.610 [DEBUG] [ThreadId: 3548] [PID: 1404] <-Leave. Remoting.Services.MessageBroker.TMessageBroker.OnChannelConnected
23-04-2020 11:18:12.611 [DEBUG] [ThreadId: 3548] [PID: 1404] ->Enter. Remoting.Services.MessageBroker.TMessageBroker.OnChannelDisconnected
23-04-2020 11:18:12.611 [WARN] [ThreadId: 3548] [PID: 1404] MessageBroker disconnected.
23-04-2020 11:18:12.611 [DEBUG] [ThreadId: 3548] [PID: 1404] <-Leave. Remoting.Services.MessageBroker.TMessageBroker.OnChannelDisconnected
23-04-2020 11:18:13.124 [DEBUG] [ThreadId: 3548] [PID: 1404] ->Enter. Remoting.Services.MessageBroker.TMessageBroker.OnChannelConnected
23-04-2020 11:18:13.124 [INFO] [ThreadId: 3548] [PID: 1404] Reconnecting to MessageBroker.
23-04-2020 11:18:13.124 [INFO] [ThreadId: 3548] [PID: 1404] Connection and registration to MessageBroker established.
23-04-2020 11:18:13.124 [DEBUG] [ThreadId: 3548] [PID: 1404] <-Leave. Remoting.Services.MessageBroker.TMessageBroker.OnChannelConnected
23-04-2020 11:18:13.124 [DEBUG] [ThreadId: 3548] [PID: 1404] ->Enter. Remoting.Services.MessageBroker.TMessageBroker.OnChannelDisconnected
23-04-2020 11:18:13.124 [WARN] [ThreadId: 3548] [PID: 1404] MessageBroker disconnected.
23-04-2020 11:18:13.124 [DEBUG] [ThreadId: 3548] [PID: 1404] <-Leave. Remoting.Services.MessageBroker.TMessageBroker.OnChannelDisconnected
23-04-2020 11:18:13.638 [DEBUG] [ThreadId: 3548] [PID: 1404] ->Enter. Remoting.Services.MessageBroker.TMessageBroker.OnChannelConnected
23-04-2020 11:18:13.638 [INFO] [ThreadId: 3548] [PID: 1404] Reconnecting to MessageBroker.
23-04-2020 11:18:13.638 [INFO] [ThreadId: 3548] [PID: 1404] Connection and registration to MessageBroker established.
23-04-2020 11:18:13.638 [DEBUG] [ThreadId: 3548] [PID: 1404] <-Leave. Remoting.Services.MessageBroker.TMessageBroker.OnChannelConnected
23-04-2020 11:18:13.638 [DEBUG] [ThreadId: 3548] [PID: 1404] ->Enter. Remoting.Services.MessageBroker.TMessageBroker.OnChannelDisconnected
23-04-2020 11:18:13.638 [WARN] [ThreadId: 3548] [PID: 1404] MessageBroker disconnected.
23-04-2020 11:18:13.638 [DEBUG] [ThreadId: 3548] [PID: 1404] <-Leave. Remoting.Services.MessageBroker.TMessageBroker.OnChannelDisconnected
23-04-2020 11:18:14.150 [DEBUG] [ThreadId: 3548] [PID: 1404] ->Enter. Remoting.Services.MessageBroker.TMessageBroker.OnChannelConnected
23-04-2020 11:18:14.150 [INFO] [ThreadId: 3548] [PID: 1404] Reconnecting to MessageBroker.
23-04-2020 11:18:14.150 [INFO] [ThreadId: 3548] [PID: 1404] Connection and registration to MessageBroker established.
23-04-2020 11:18:14.150 [DEBUG] [ThreadId: 3548] [PID: 1404] <-Leave. Remoting.Services.MessageBroker.TMessageBroker.OnChannelConnected
23-04-2020 11:18:14.150 [DEBUG] [ThreadId: 3548] [PID: 1404] ->Enter. Remoting.Services.MessageBroker.TMessageBroker.OnChannelDisconnected
23-04-2020 11:18:14.150 [WARN] [ThreadId: 3548] [PID: 1404] MessageBroker disconnected.
23-04-2020 11:18:14.150 [DEBUG] [ThreadId: 3548] [PID: 1404] <-Leave. Remoting.Services.MessageBroker.TMessageBroker.OnChannelDisconnected

That’s our instrumentalization of every time the event for OnDisconnected from Olympia is executed from a client service connecting to it, notice the time difference. That’s ongoing, so far it just continues doing so and our services start failing until we restart Olympia. We use autoreconnect = true on Indy based channels.

Olympia memory usage is low and cpu usage is low so thats good news.

What will happen if we are sending hundreds or thousands of requests and Olympia is failing are we channeling / sending the same amount back to the main thread on the OnException event for Olympia Server component? any possibility of locks? How a retry policy will work there?

I mean if it is a retry counter, it will mean nothing, and if it is a counter with an exponential back off time it will block the main thread and things will go bad quite soon.

Back to this issue, it is simply disconnecting without any error being sent back, it just starts connecting/disconnecting, the services are running on azure, they talk to each other inside azure so downtime of parts of the infrastructure is less of an issue and I’m discarding it.

EvgenyK · April 27, 2020, 9:28am

Hi,

is it possible to create some kind of testcase that reproduces this issue?
I’d happy to reproduce it on our side

estebanp · April 27, 2020, 2:23pm

I wish, but production conditions will be hard to reproduce. In the meanwhile, is it safe to assume that the OnException event on the Olympia component runs on the main thread?

If it is, what happens if multiple exceptions happened simultaneously or there is a queue of errors? how the retry will work? does it waits for response of the first one or it just queues up? is it a thread safe variable somewhere?

EvgenyK · April 27, 2020, 2:54pm

Hi,

it depends on code inside event. for example, if it’s some GUI interaction like MessageBox, ShowMEssage, etc, then no. if no GUI interaction, it can be considered as safe

by default, all requests are put to queue so they will be sent step by step.
if error is detected, onException will be raised and if Retry is set, broken package will be sent again

estebanp · May 22, 2020, 4:33pm

Hi guys,

Quick question, noticed this on one of our deadlocks with Olympia.

(@OLD_EVENTUSER nvarchar(36),@OLD_EVENTID nvarchar(36))DELETE
FROM
[EVENTUSERDATA]
WHERE
([EVENTUSER] = @OLD_EVENTUSER)
AND ([EVENTID] = @OLD_EVENTID)

the query defined (@OLD_EVENTUSER nvarchar(36),@OLD_EVENTID nvarchar(36)), but in the table, ([EVENTUSER]
AND ([EVENTID] is defined as char(36), using nvarchar in the where clause, will cause whole index scan which may be a cause for deadlocks.

Is it possible to query with (@OLD_EVENTUSER varchar(36),@OLD_EVENTID varchar(36)) ?

estebanp · May 24, 2020, 10:54pm

Hi guys,

I decided to update to using the latest available Olympia with the db updates suggested and our production environment is giving us deadlock alerts 3 to 4 times a day. Considering that we are in very low traffic days, this is just going to get worst. Could we check if we can do the suggestion presented above to alleviate things?

The query causing the deadlock is always the same. The one above.

RemObjectsSoftware · May 25, 2020, 10:49am

Thanks, logged as bugs://84349

antonk · May 25, 2020, 4:57pm

In process:

exec sp_executesql N'DELETE
FROM
  [EVENTUSERDATA]
WHERE
  ([EVENTUSER] = @OLD_EVENTUSER)
AND  ([EVENTID] = @OLD_EVENTID)
',N'@OLD_EVENTUSER varchar(36),@OLD_EVENTID varchar(36)',@OLD_EVENTUSER='13b4837a-c294-4367-bd08-9c7d247668f6',@OLD_EVENTID='03209ec5-bd56-4b03-9261-6040cc1780a7'

Do you have a testing environment where this issue can be reproduced without affecting production servers?

estebanp · May 25, 2020, 5:16pm

Hi,

I will check on our Dev servers. What version of roolympiaserver. Exe are you running? , I’m also going to trace the execution in Dev, production has too much traffic to attempt isolation.

We are getting the lock every 4 hours or so. ill verify what you post is what it’s being executed.

antonk · May 25, 2020, 5:19pm

That’s from my development environment. Just showing that I have a solution for this particular issue.

estebanp · May 25, 2020, 6:18pm

I’ll put it up in dev and in production. Right now, I’ll take whatever you have cause dba’s are getting a bit jumpy with me.

antonk · May 25, 2020, 10:21pm

That said, in case of a deadlock there are 2 transactions/queries involved. Is it possible to catch the second transaction 's query too?

estebanp · May 25, 2020, 10:51pm

Hope that helps.

estebanp · May 27, 2020, 2:32pm

Hi there,

Today we had 5 deadlocks so far, all under the same circumstances as the sceenshot posted above. Do you guys need anything else?