Olympia recovery process after stop/crash

net
delphi
(estebanp) #1

On my process of trying to implement some sort of redundancy for Olympia, does Olympia automatically reconnects its clients after being restarted (stop/start) or crashed?

Meaning, I’m aware that the TROlympia session manager and probably the Olympia Event manager have an event sink from olympia service. On the case of a disconnection, does it re-registers somehow to it after going down?

Is any other action required after an Olympia service goes down in order to restore its former state?

I did some testings on our dev environment by turning off olympia and starting it again after 30 seconds or so and most of our services couldnt restore properly, mainly loosing their event sinks, dont know if it is something else or thats what will happen. Any insides will be appreciated.

The services already re-register their clients event sinks when they suffer a disconnection, but we dont have control from Olympia service to the Olympia clients (Olympia Session/Event components).

(EvgenyK) #2

Hi,

we will add a special table to DB for Olympia that will store information about registered events sinks for specific client, after this it should work properly.

2 Likes
(estebanp) #3

Hi Evgeny,

Thank you for replying, that is a very welcomed news. Extremely important if we want to be able to build and promote RODA as a way to create robust and scalable solutions.

In the meanwhile, what are the steps I need to follow in order to re-register the TROOLympiaSessionManager and Events back to a new Olympia service after loosing contact with it.

I already put in place a “service template” for a lot of our services and the thought of having to restart 14 services if one needs to restart (Olympia) will definitely raise some eyebrows here.

(EvgenyK) #4

Hi,

at this moment you can store subscribed events inside session and later re-register them like

  if ROSessionManager.IsSessionPresent(aGuid) then begin
    s := ROSessionManager.FindSession(aGuid,False);
    try
      s1 := VarToStrDef(s.Values['RegisteredEvents'],'');
      if s1 <> '' then
        ROEventRepository.AddSession(aGuid, aChannel as IROActiveEventServer, s1);
    finally
      ROSessionManager.ReleaseSession(s, False);
    end;
    Log(GUIDToString(aGuid) + ' is registered to events');
  end;
(estebanp) #5

Hello Evgeny,

Thank you for replying. The solution you mentioned does work for any other service using Olympia. I did something very similar thanks to your suggestions and looking at your code, but when trying to re-establish Olympia’s event sinks themselves, it fails.

What I did was, I put some code on the OnConnected/OnDisconnected event of the channel use to connect to the Olympia and hold by any instance of the TROOlympia session manager.

When OnDisconnected I mark if it is the first activation or not (to avoid loops). On the OnConnected, if it is not the first activation it means we are recovering from some issue with Olympia and then I attempt to re-register Olympia events so this Olympia client can continue to receive messages.

 if fIsDisconnected then
 begin
  fLogger.Info('Reconnecting to Olympia.');
  if fSessionManager.IsSessionPresent(fRemoteServiceConnection.ClientId) then
  begin
   if Supports(fRemoteServiceConnection.Channel, IROActiveEventServer, lActive) then
    begin
     fEventRepository.AddSession(fRemoteServiceConnection.ClientId, lActive, 'OlympiaEvents');
     fEventRepository.AddSession(fRemoteServiceConnection.ClientId, lActive, 'OlympiaSessionManagerEvents');
    end
    else
    begin
     fEventRepository.AddSession(fRemoteServiceConnection.ClientId, 'OlympiaEvents');
     fEventRepository.AddSession(fRemoteServiceConnection.ClientId, 'OlympiaSessionManagerEvents');
    end;
  end else
   fLogger.Error('Session not found. Could not reconnect.');

I’ll get timeouts on any attempt to contact olympia. I also tried to simply activate, deactivate the session and event repository components which hold all the registration code, but it will fail with timeouts. Something on the client needs to be completely recreated in order to re-establish a connection and a proper registration to an Olympia instance, which is strange cause other services will restore just fine without having to go to those lengths.

Any tips? The issue is quite simple to reproduce, just open olympia, connect a client that uses it, shutdown olympia, turn it back on and do something to try to re-establish olympias connectivity to that client (previous state). If you can tell me what to do to achieve that result while i wait for the future improvements it will be great.

Note: The fRemoteServiceConnection is just and Interfaced wrapper to hold the clientId and other properties on a single place, the underlying is just a simple Channel/Message. On this ocassion, fRemoteServiceConnection.Channel will be the same channel as what SessionManager has assigned. The code can be simply replace to be SessionManager.ClientID for understanding purposes.

(estebanp) #6

I also noticed that If i simply do a Active := True / False of the Olympia components on the Connect/disconnect of the channel used to connect to it, it will fail on the Login method (on this case on the GetTimeOut method, cause i dont use user/pass). So something goes very wrong on the connection between service client and olympia service.

(estebanp) #7

Any insides, hints, suggestions? Im now trying crazy things (recreating olympia components, connections,etc) all with negative results. I’m afraid that if by any chance there is a need to restart Olympia (forcefully or voluntarily) our entire production environment will go down and that is definitely a scenario I can not embrace.

(EvgenyK) #8

I’ll investigate this case tomorrow.
from the first sight, if client does logoff/login, it works.

(estebanp) #9

Thank you.

The issue is easy to reproduce. Open an Olympia, open a client and server that uses olympia. Close Olympia wait a few seconds, open Olympia back up. The expected behaviour is everything should continue as supposed, the current behaviour is the lost of all pub/sub between clients and between an olympia client (server service) and Olympia service. Exception error messages will continue until all server services using olympia are restarted.

That is understandable due to the fact that Olympia clients do not store/keep track of its own internal EventSinks (OlympiaEvents,OlympiaSessionManagerEvents). The question will be, how do I re-register to those events so the Olympia Client component keeps working? (I already tried somethings based on my posts above) and didnt work. Hopefully your testings will help me out.

(RemObjects) #10

Thanks, logged as bugs://82591

(EvgenyK) #11

looks like it is the issue in Olympia itself

(antonk) #12

Just a small update.

We (actually I) are working on the Olympia right now to resolve this while not sacrificing stablilty or performance of the Olympia Server

(estebanp) #13

Thank you very much for the update Anton.

I’m actually constantly checking the forum for updates because I put on hold further deployments or use of our “service template” (service base for all our our “microservices”, which relies heavily on RO) until we have some sort of solution.

But as of everything, good things take time, so Im sure you guys will pull out a good solution.

(RemObjects) #14

bugs://82591 got closed with status fixed.

(estebanp) #15

Great news.

How can I try the solution? Beta download?

(EvgenyK) #16

Hi,

it isn’t ready yet - not all tests are passed …

(antonk) #17

There are some harsh places to fix and polish.

Still currently we already have event subscriptions recovery after Olympia and/or server app crash or restart.

(estebanp) #18

Thank you for replying. Understood. I’ll buy more time over here. Please let me know when you feel comfortable to release it.

Thank you.