We were running internal tests + some DB structure modifications.
As for the Delphi side of things - please make sure that your SuperTCP channel used to connect to the Olympia Server instance has its AutoReconnect property set to true. Our Delphi Chat sample was able to reconnect to Olympia after its restart (given the AutoReconnect property was set to true )
Also please note that there is one additional index in the db that needs to be created (for MS SQL database):
CREATE INDEX [IX_EVENT_ID] ON [EVENTUSERDATA] (
[EVENTID] ASC
);
This index + some queries now running with WITH (NOLOCK) should resolve the deadlock issue
Also PK error on event subscriptions management should also be gone. It now uses client-tied lock to ensure that one and only one subscriptions update operation can be run for any given client at any given moment of time.
They all do, BUT, I do alter the option to Retry to false on the OnException event for the message broker (Olympia).
It usually ends on a never ending loop. Does this affects the channel internals? I’m assuming Retry will be to retry the operation being done, not the internal attempts to reconnect, which should be managed separately based on timeout, retryintervals, etc settings.
Is this assumption correct?
If it is not, on the next request that a service connected to Olympia receives it will attempt to send a message to Olympia and re-activate things isnt?
can you specify what exactly build you are using - custom one or stable/preview?
I’ve tested the HTTP chat sample and it works w/o any issues with Olympia and OnException event:
procedure THTTPChatServerMainForm.OlympiaChannelException(
Sender: TROTransportChannel; anException: Exception; var aRetry: Boolean);
begin
aRetry := True;
end;
I’m using version 10.0.0.1463, in general, but starting to test the new Olympia (Anton passed me the new version) but just Olympia, still using the .1463 build connecting to it.
The question is if the recommended value is aRetry := True. Because if there is an issue with Olympia, it will continue rejecting calls and the service using it will look just keep retrying the same call over and over. I turned it off because if Olympia fails, it will reject the call and throw an exception right away, knowing for certain that every call being passed to the service using Olympia will trigger an Olympia check (to update latest activity) and a new attempt will be done .
Basically, I’m expecting Olympia to be unstable and never recover. Do I leave it on aRetry := True?
You can calculate count of failures and set False if it reached some specific number (10/15/20/etc) in consecutive calls.
if connect was successful, you can set that variable to zero
Originally something like that was done, but because we have 30+ client services connecting to the service that has the Olympia connection, it was not doing much.
I’ll put it back to see if it makes any difference with the new version and implement an exponential back off retry which probably will serve us better.
That’s our instrumentalization of every time the event for OnDisconnected from Olympia is executed from a client service connecting to it, notice the time difference. That’s ongoing, so far it just continues doing so and our services start failing until we restart Olympia. We use autoreconnect = true on Indy based channels.
Olympia memory usage is low and cpu usage is low so thats good news.
What will happen if we are sending hundreds or thousands of requests and Olympia is failing are we channeling / sending the same amount back to the main thread on the OnException event for Olympia Server component? any possibility of locks? How a retry policy will work there?
I mean if it is a retry counter, it will mean nothing, and if it is a counter with an exponential back off time it will block the main thread and things will go bad quite soon.
Back to this issue, it is simply disconnecting without any error being sent back, it just starts connecting/disconnecting, the services are running on azure, they talk to each other inside azure so downtime of parts of the infrastructure is less of an issue and I’m discarding it.
I wish, but production conditions will be hard to reproduce. In the meanwhile, is it safe to assume that the OnException event on the Olympia component runs on the main thread?
If it is, what happens if multiple exceptions happened simultaneously or there is a queue of errors? how the retry will work? does it waits for response of the first one or it just queues up? is it a thread safe variable somewhere?
it depends on code inside event. for example, if it’s some GUI interaction like MessageBox, ShowMEssage, etc, then no. if no GUI interaction, it can be considered as safe
by default, all requests are put to queue so they will be sent step by step.
if error is detected, onException will be raised and if Retry is set, broken package will be sent again
Quick question, noticed this on one of our deadlocks with Olympia.
(@OLD_EVENTUSER nvarchar(36),@OLD_EVENTID nvarchar(36))DELETE
FROM
[EVENTUSERDATA]
WHERE
([EVENTUSER] = @OLD_EVENTUSER)
AND ([EVENTID] = @OLD_EVENTID)
the query defined (@OLD_EVENTUSER nvarchar(36),@OLD_EVENTID nvarchar(36)), but in the table, ([EVENTUSER]
AND ([EVENTID] is defined as char(36), using nvarchar in the where clause, will cause whole index scan which may be a cause for deadlocks.
Is it possible to query with (@OLD_EVENTUSER varchar(36),@OLD_EVENTID varchar(36)) ?
I decided to update to using the latest available Olympia with the db updates suggested and our production environment is giving us deadlock alerts 3 to 4 times a day. Considering that we are in very low traffic days, this is just going to get worst. Could we check if we can do the suggestion presented above to alleviate things?
The query causing the deadlock is always the same. The one above.
I will check on our Dev servers. What version of roolympiaserver. Exe are you running? , I’m also going to trace the execution in Dev, production has too much traffic to attempt isolation.
We are getting the lock every 4 hours or so. ill verify what you post is what it’s being executed.