Possible issue with local channel

Hi guys,

Delphi clients, Rio 10.3.2. Win64. Using experimental Olympia. (10.0.0.1474).

I’m testing local channels separately cause i’m having some issues that I can not ping point their origins.

Im sending on support email the test project.

Case 1:

Step 1.
Start server. Setup connections to Olympia. Start server.
Send a message from the server.

Step 2. Open two clients with different usernames.
Connect to server and send a message.

Step 3. Send messages from each client to make sure everyone is receiving them.

Step 4. Close the server app. Wait 11 seconds. (to pass the timeout time)

Step 5. Open server app, start server. Send a message.
No client will receive the message. (WRONG)

Step 6.
send a message from first open client. Server will get it, first client will get it. Second client will not. (WRONG)

Step 7.
Send a message from second client. Everyone will get it.

Step 8.
Send a message from Server. Everyone will get it.

Step 9.
From this point on, things will work accordingly.

Im wondering if the local channel has anything to do with it, because is the one not being able to send messages after reconnection.

Some extra details.

I thought that maybe it was the application Id which I didnt specify to be a constant on the server side was the cause, so I added it and test it again to my surprise i had the same result.

Which makes me think that there is a need to check if application id is being respected on event sinks and reconnections on RO SDK.

Esteban,

My apologies for the delayed reply, but the best person to help with this is off today due to a local holiday; I’ll make sure someone will the back to you, tomorrow!

yours,
marc

Thank you Marc. Stay safe.

1 Like

Thanx, you too!

Hi,

Add a new event to client’s app - Channel.OnDisconnected.
after step4, both clients are disconnected. above event informs about it and they can’t receive any new messages from server .
step 5 - is CORRECT because both clients are disconnected.
step 6, 1st client reconnects to server and able to receive events. 2nd client still disconnected. so this case also CORRECT

Thank you Evegeny,

I updated the sample to set all channels to AutoReconnect = True. That’s my fault and in production is already setup to be auto reconnected.

Now after doing that, I also corrected the code to include the application Id and make the logging to the console thread safe so i can set all the events to avoid synchronization so it gets closer to the real life cases we use. Also added log entries for connections and disconnection.

Now on this test im seeing an interesting situation that you could perhaps notice whats the issue. Steps to reproduce it.

  1. Open Server app. Start the server. Send message. All good.
  2. Open client app. Set username, connect. Send message. All good.
  3. Open another client app. Set username, connect. Send message. All good.
  4. Close the Server. All clients will show disconnection on their consoles.
  5. Open the server. Start the server. See the console messages on the clients, they will show they are connected but also will show duplicate event sinks arriving to the clients announcing the server logging in (at least two).
  6. Close the server. Start the server. Check again. Same thing.

If at the first attempt it doesnt show, a few attempts will start giving duplicate event sink messages . Any ideas on what am I doing wrong?

Thank you.

The updated code was sent to support.

Hi there,

Another possible issue to test using the latest example app.

  1. Open Server app. Start the server. Send message. All good.
  2. Open client app. Set username, connect. Send message. All good.
  3. Open another client app. Set username, connect. Send message. All good.
  4. Stop Olympia service.
  5. Wait for an internal timer of the server app to send an automatic message within a minute. It will hang the app (it is running synchronously) for a moment, but then a “No connection available” will show. Click ok on the window. OR simply try to send a message from the server, it will hang for a bit and then same message.
  6. Start Olympia service.
  7. Try to send a message from anywhere. Server, clients, etc. no messages will be received on any place. All event sinks will failed from that point on.
  8. If you close and open the server app and connect it, it will receive a lot of the missed messages.

Hi,

Olympia removes events only when it received confirmation about delivering, it explains duplicated events: client at connecting asks for missing events so it can receive events that are sending at this moment but aren’t confirmed yet.

this works correctly for me:


as you can see, it was recovered correctly after Olympia was stopped and started

Thank you Evgeny,

Regarding not sending messages anymore after error I’ll test if it has something to do with environments. We have Olympia on a different server, a w2016 machine and apps on w10s.

The duplicated I thought that could be, but why two login messages, login messages only happen once when the server starts, there was no queue of failed logins because none was sent when the server app was down. That part is confusing for me. Care to extend a bit why a single message will create two queue items? Time stamps will show multiple event sinks at the same time. We are seeing this in production also.

Additionally, If Olympia is executed on the same machine as the client and server apps, it recovers correctly. I tested that.

But, if olympia is running on a different computer, it will not recover. Firewalls were turned off on both machines.

Havent tried Olympia running on a different w10 machine, for fun, cause in reality we will never run olympia on anything other than a w server version.

Were you able to reproduce the issues above based on the feedback i provided? do you need any additional data? Now that I have a clean (non restricted) code sample, I can try anything, deploy, network settings, etc.

Also I just confirmed that IF Olympia is running on a W10 machine, different than the computer where the client and server apps are running, server and clients will recover if Olympia is turned off.

Under windows Server standard 2016, it will not.

I’ll start checking TCP/IP settings. Firewalls are off, and if they were on Olympia will not work in any case.

The event sink duplication happens disregarding of what OS olympia is running on.

I just tried between W2012 R2 in the cloud and they do reconnect. Dups of login event sinks also happened there.

Im reviewing network settings to find out what could it be.

Interesting enough, in any of the cases after starting again Olympia and attempting to send a message from the server, no error messages is received. On the w2016 test the code actually executes but the event sink never reaches the local channel or the client channels.

Found it. Policy group settings on dev domain forced a domain level firewall rule that could not be turned off by simply turning off firewall on w10.

Servers do not have that rule, only clients on the domain. Turned it off, now it is recovering.

We are still getting the duplicates though.

we are investigating this case

Hello there,

Any luck? need some additional testings or anything on my part?

Hi,

we decided to review .NET part of Olympia and see why extra events were sent.
antonk will do it soon.

1 Like

Thank you Evgeny,

I’ll run some more tests on my side and see if i figure something out.

A quick update.

We are still having issues with reconnection after an interruption on Olympia and I think i was able to reproduce it.

  1. Open two servers, both connecting to the same olympia but listening on different ports. (e.g. 9014 and 9015).
  2. Open two clients, connecting one to 9014 and another one to 9015.
  3. Send messages from any of the clients and servers, all of them will receive their messages between them. So far so good.
  4. Stop olympia service.
  5. Wait 15 seconds.
  6. Start olympia service.
  7. Some dup messages may appear (you guys are working on it, so skip that one).
  8. Try to send messages between them.
  9. Notice how some of the clients, specially one of the servers connected to Olympia directly are not receiving messages.
  10. See also the automatic “Healthy” notifications being received by some clients and some others not.

So far these issues are killing our production environment, if Olympia service suffers any sort of issue and recovers. We are getting services duplicated which trigger all kind of actions (deletions, alarms, customer notifications) twice, lots of the services will not recover because if they are connecting directly to Olympia these behaviour of maybe yes, maybe not force us to basically restart all services. So everything goes down.

Hopefully I’m missing something, but at the same time, I hope not, because we continue to have these issues and by reproducing it you guys might be able to fix them.

Note: I tested this on the same machine, running win 10. I also tested it on w10 client machines and a W2K12 server. Same results.

Hi,

I can’t reproduce these steps. both clients and both servers receive all messages after restarting of Olympia
I’ve tested on the same machine.