Olympia EventDataCount keeps growing

estebanp · July 29, 2020, 11:07pm

I’m close to give up and probably I should but I’m using eventsinks across many services and can’t simply abandon them right now.

Olympia simply starts connecting and disconnecting, or timing out taking every service with it down.

date="2020-07-29 17:33:00.492 ", app=CER, version=3.0, host=D-EVENTS-W1, user=cerberusservice, pid=4968, threadid=1280, level=ERROR, msg="Connection=OLY.RO(supertcp://olympia:9000/bin). Status=Connected."
date="2020-07-29 17:33:00.994 ", app=CER, version=3.0, host=D-EVENTS-W1, user=cerberusservice, pid=4968, threadid=1280, level=ERROR, msg="Connection=OLY.RO(supertcp://olympia:9000/bin). Status=Connected."
date="2020-07-29 17:33:00.995 ", app=CER, version=3.0, host=D-EVENTS-W1, user=cerberusservice, pid=4968, threadid=2068, level=ERROR, msg="Connection=OLY.RO(supertcp://olympia:9000/bin). Status=Disconnected."
date="2020-07-29 17:33:01.461 ", app=CER, version=3.0, host=D-EVENTS-W1, user=cerberusservice, pid=4968, threadid=3732, level=ERROR, msg="Connection=OLY.RO(supertcp://olympia:9000/bin). Timeout"
date="2020-07-29 17:33:01.461 ", app=CER, version=3.0, host=D-EVENTS-W1, user=cerberusservice, pid=4968, threadid=3732, level=ERROR, msg="Connection=OLY.RO(supertcp://olympia:9000/bin). Timeout"
date="2020-07-29 17:33:01.461 ", app=CER, version=3.0, host=D-EVENTS-W1, user=cerberusservice, pid=4968, threadid=3100, level=ERROR, msg="Connection=OLY.RO(supertcp://olympia:9000/bin). Timeout"
date="2020-07-29 17:33:01.461 ", app=CER, version=3.0, host=D-EVENTS-W1, user=cerberusservice, pid=4968, threadid=3100, level=ERROR, msg="Connection=OLY.RO(supertcp://olympia:9000/bin). Timeout"

Restarting Olympia doesnt help. Probably is battling with left over requests on the db (sessions, eventuserdata, etc) and cant stand on its own feet. CPU usage is low, memory usage is low.

I’m forced to be deleting all rows from the db to have Olympia start accepting requests after a restart.

Services are set to automatically restart on failure so they stay in loop attempting to authenticate and they probably add more salt to the wound.

If I use the sample chat i submitted to RO for testings I notice how the EventData keeps adding more rows on every message that is sent even if it is successfully delivered and keeps going.

I guess there is some moment when it cleans up but why does it keeps the eventdata after a message is successfully delivered to its peer. I have instances where it doesnt clean it up and it just keeps growing. We do hundreds of event sinks, and thousands of calls per minute interacting with each RO Service (and as such “touching” olympia).

Does eventdata holds the connection information (e.g. ip, protocol, etc) where the message needs to be send back (an eventsink)? if not, where is that being stored or sent?

thank you.

antonk · July 30, 2020, 7:39pm

Hello

Could you please show your Olympia configuration file?

So, this is how event processing in Olympia works:
When an event is sent

A new row is created in the EVENTDATA table containing event payload
New rows are created in the EVENTUSERDATA table containing link between event receiver’s session Id and event payload
Once event is delivered these link rows are deleted
Once there are no more link rows referencing given event payload it is deleted during cleanup cycle that happens each 120 seconds

And here’s the catch. If there is a stale EVENTUSER row that is not yet expired then on each event broadcast there will a new EVENTUSERDATA entry for it and corresponding EVENTDATA entry will survive the cleanup cycle.

That’s why I need to take a look at your config file and if possible on the Olympia db (see PM).

The only Olympia flaw I see atm is that SESSION and EVENTUSER timeouts are handled separately and that explicit session logout doesn’t destroy the corresponding event entry immediately (I’ll log this)

RemObjectsSoftware · July 30, 2020, 7:39pm

Thanks, logged as bugs://84762

estebanp · July 31, 2020, 6:02am

{
“ServiceName”: “GX.Olympia”,
“Port”: 9000,
“EventTimeout”: 1800,
“SessionTimeout”: 1800,
“PoolSize”: 10,
“InMemoryMode”: false
}

If this metric is in seconds, I didnt know what Event Timeout referred to but now that i know 1800 is an overkill. I’ll change it to 5 minutes (300).

In production it is set to 10 minutes (to survive a full machine restart) and a pool size of 20. Session timeout is an hour, to survive a server maintenance window.

antonk · July 31, 2020, 9:47am

Yes, sitting there for 30 minutes and gathering all broadcasted events can result for an excessive db load.

Could you provide step-by-step reproduction steps? I’ll double-check internal processes of the client and server - Olympia interaction. Delphi might have its own specifics, so it is better to double-check them.

RemObjectsSoftware · August 5, 2020, 11:21am

bugs://84762 got closed with status nochangereq.