[69749 Open] Serious stability issues with Relativity Server

XerXes · October 4, 2016, 3:11pm

Using default 7099 port

antonk · October 4, 2016, 5:44pm

Thanks for the report. I see the use pattern and will investigate it.

Also let me describe some Relativity Server concepts regarding the ports it exposes.

Relativity Server exposes 3 channels:

Default unprotected channel
Default protected channel
Custom Channel

(1) is a SuperHttp channel that always listens to port 7099
(2) is a SuperHttp channel that always listens to port 7100. Is there is a SSL certificate configured for the server instance it will use this certificate to perform SSL/TLS encryption
(3) is a customizable server channel. It cannot be configured to use ports 7099 or 7100 as these are reserved to (1) and (2)

It is strongly recommended to use the (3) channel in production as this channel provides better customization options. Channels listening on ports 7099/7100 should be used as a fallback and shouldn’t be exposed to clients.

You are using the channel (1) in so called ‘fallback’ mode (when a SuperHttp channel serves simple plain Http requests). Running out of sockets is clearly a bug and might be related to the ‘fallback mode’ (f.e. it is leaking a socket somewhere).
Still I would suggest you to consider to move to channel (3) over time. This should as well improve response times. There is no need to update all client apps at once as the same Relativity Server instance can process queries on all 3 cahnnels simultaneously.

Regards

XerXes · October 11, 2016, 1:40pm

In case this helps you troubleshoot the issue…

We did as you suggested and moved to an alternate port and that is working fine so far.

We noticed that port 7099 will still stop serving non local requests but the new port remains working. We tough at first this had to do with socket exhaustion but that doesn’t seem to be the case. The Relativity Server simply stops serving remote requests on port 7099. We haven’t been able to pinpoint the exact cause, sometimes it happens multiple times in a day, sometimes it runs fine for a week.

We’ll be closing client access to port 7099 as soon as we move all clients to the new port as you suggested and that should solve our problem.

Thanks for you time, I’ll update this post if anything changes.

antonk · October 11, 2016, 2:38pm

Hello

Thanks for the info, it helps a lot. It means that something gets broken internally in the server channel, not on the system level. We’ll try to reproduce it now in a controlled environment to pinpoint the issue.