Making my setup "glitch proof"

obones · May 19, 2017, 1:19pm

Hello,

My current setup is as follows:

A central server using Synapse SuperTCPServer receives requests from Clients to perform tasks via ClientService.
It does not process those tasks itself but rather sends them to “Nodes” that will perform them. Those nodes connect to the server via NodeService and receive the order to perform their various actions via events.

So the usual course of action is as follows:
Node conne* cts to Server and waits for tasks

Client connects to Server
Client sends a task to Server via ClientService.PerformTask, which is a blocking call
Server sends an event to Node with the Task to perform, which is a non blocking call.
Server starts waiting on a signal from the Node
Node performs the Task and calls NodeService.TaskEnded to signal the Server that it has processed the task
Server ends its wait and finishes the PerformTask call
Client receives result for its task.

This setup with events between Server and Node is made so that there is only the need to open a firewall port on the server, not on any of the potentially numerous nodes.

The tasks themselves can be quite lengthy but this works just fine on reliable networks.

However, this all goes down the drain when the network disappears between any parts of this setup.
For instance, if the network is down when the Node notifies the Server it has finished working, the Server will never receive its notification.
Conversely, when the network goes down just before sending an event to the Node, I don’t get a notification and the Server waits indefinitely.
And finally, when the network goes down in the middle of a ClientService.PerformTask call, the Client gets a EROTimeoutexception and has lost all its work, even if the task itself is still running on a node and would succeed a few seconds later.

So, what I’d like to do is to have a way to make this whole setup more reliable when faced with erratic network behavior.
For the Node to Server replies, I could use the Retry parameter in the OnException handler, but that requires being able to detect the situation.
For the Server to Node requests, I don’t see how I can have the event being sent again if it never reached its destination.
For the Client to Server connections, I believe I could use the Async interfaces, but the documentation does not tell me how it behaves when the network goes down between Invoke_ and Receive_ calls.

I would appreciate any suggestions that would allow me to make this whole setup more robust.

EvgenyK · May 22, 2017, 7:32am

I think this scenario should work:

client sends request for task asynchronously
server also execute tasks asynchronously
server stores result or status of finished task
server sends notification via events about finishing of task
client receives notification about finished task and sends confirmation that he was notified
server removes status of task after confirmation
client periodically asks about all statuses of finished tasks and sends confirmations for removing them from server

events + confirmation from client about received result/status of finished task should solve case with non-reliable networks

in your case, it will be a bit more complex because you work via “proxy” server

obones · May 23, 2017, 3:12pm

Thanks for your reply, it confirms what I had in mind.
However, I’m not sure of what happens with events in this scenario:

Client connects
Client Sends request for task via Async interface
Connection goes down
Task finishes, Server stores result/status
Server sends event
Connection goes back up

Would the Client receive all the missed events? Are they put in a queue waiting to be sent? If yes, is it a queue (FIFO), a stack (LIFO) or some other structure?

EvgenyK · May 23, 2017, 3:50pm

server can store array of unconfirmed finished tasks inside session and send it regularly to client until client confirms receiving it.
server can send it by timer, or client can call special server method - it depends on your implementation.

obones · May 24, 2017, 8:46am

Ok thanks, I see how this could be done.
Back to the drawing board then, but it should be doable.