Older Remobjects release does much better multi-threading

bobokonijn · February 27, 2012, 12:10pm

Hello all,

I have a multi-threaded client application that replicates data from remote databases. Some 250 threads run concurrently and are fully independent. The threads each run a cycle that calls a Remobjects service on a remote server, then sleeps for several minutes (WaitForMultipleObjects) , so there’s a variety of threads sleeping / connecting / working. CPU load of this application is extremely low (5 % or so). This application runs as a NT service and has an Intraweb user interface so that it can be managed using a web browser.

The last Remobjects version that works for me is the “winter 2010” release. With this “winter 2010” version, the web user interface of my application behaves smooth and fast, reaction to mouse clicks is instantaneous. Using ALL later versions of Remobjects, my user interface is extremely sluggish, whenever I click a hyperlink the program pauses for 30 seconds or more before responding. The problem becomes worse with an increasing number of worker threads. I use TROSynapseSuperTCPChannel for my connections, but the Indy variety of superchannel behaves exactly the same. This is utterly frustrating.

Remobjects “spring 2011” saw some huge changes in the “super” channels, lots of code was refactored and put in a base class. I have the impression that there’s a new bottleneck because of this, maybe some kind of thread contention / deadlock in the code.

What can I do? My application is much too complex to reduce it to a simple test case. Am I condemned to stick with the older Remobjects version forever?

mh · February 27, 2012, 11:34pm

Bob,

i’m sorry to hear you are having problem like this. of course it’s difficult yo comment without knowing more details, but i’ve flagged this issue to bring it up with our RO team for discussion tomorrow; this does sound like a very critical issue that we need to get to the bottom of (whether it is a bug that we can fix, or “just” a breaking change that you can adapt to). we’ll let you know once we know more…

thanx,
marc

sergeyl · February 29, 2012, 12:12pm

Hello Bob,

Need to say that for about 2 years since Spring’10 we received no complaints regarding the superchannel performance. Looks like we have to try to reproduce your case using some modelling application. To start clarifying the things please answer:

Did you (or have any chance to) try a different superchannel, say, SuperHttp?
How much time does it take to perform each request from a thread and how much data is transferred? Approximate values will do, like ‘almost instantly’, ‘seconds’, ‘tenth of seconds’.
What kind of activity inside the application is caused by clicking the web form?
If I understand it right, your threads are doing some remote data fetching and are not directly related to user actions via the web interface. Do you experience the same performance degradation for this or it affects web ui only?

Best regards - Sergey.

bobokonijn · March 1, 2012, 1:22pm

Hi Sergey,

thanks for your patience with me. (BTW, my name is Arthur. My nickname “Bobo Konijn” is a Dutch cartoon figure).

I managed to isolate what triggers the apparent deadlock.

The web interface of my application only becomes sluggish/unresponsive if a larger number of my Remobjects worker threads are unable to connect to their servers. This interference wasn’t present in the “Winter 2010” release of RO SDK.

These are my settings for the Synapse super channel:

object SynapseChannel: TROSynapseSuperTCPChannel
StoreActive = False
DispatchOptions = []
OnLoginNeeded = ChannelLoginNeeded
OnReceiveStream = ChannelReceiveStream
OnSendStream = ChannelSendStream
ServerLocators = <>
SynchronizedProbing = False
Host = 'localhost’
AckWaitTimeout = 15000
AutoReconnect = True
Left = 56
Top = 8
end

Kind regards,

Arthur

bobokonijn · March 2, 2012, 10:11am

Hi Sergey, to answer your last questions more in-depth:

Indy Superchannel makes no difference whatsoever in comparison with Synapse.
The amount of data transferred per thread is low (a few kb per iteration) and the threads sleep most of the time.
The connections have very high latency (more than one second) and very low bandwidth (64 kbps) due to the medium used (satellite internet).
The web form is only a status display that hardly interacts with the threads themselves. It allows to create new connections and to stop existing ones. For the rest, it only shows which threads are connected and which are not.
The performance degradation that plagues me is only in the Web UI. The threads themselves run absolutely smoothly. It only occurs if a large number of threads are unable to reach their hosts (so they are in a “connecting” state that fails after roughly 10 seconds). Having a few dozen threads in a “connecting” state makes the Web UI become unresponsive.

sergeyl · March 2, 2012, 3:02pm

Hello Arthur,

Sorry for guessing your name wrong
Thanks, any information is very valuable in this case. So I’m going to try to write a test case application soon and see what happens.

Best regards - Sergey.

bobokonijn · March 12, 2012, 8:53am

I was called out of bed this week-end by my manager who wanted me to re-boot our server. My RO application was completely frozen. I now need to revert to version 6.0.49.861 again for stability’s sake. PLEASE solve this!

sergeyl · March 12, 2012, 3:27pm

Hello,

We are doing our best, but this is not an easy case. Please be patient for some while.

Best regards - Sergey.

mh · March 12, 2012, 4:03pm

Athur,

we’re in a bit of a difficult position here, as we don’t have a test case from you; we need to build one on our own (and even then, who knows if it will let us reproduce the problem, but if not, at least we have a sample we can give you to compare to your project), and that will take some time, and we’re in a bit of a crunch mode this moth.

but rest assured that we are on this and this has not been forgotten.

in the mean time, i would indeed suggest to go back yo the version that works for you, for the production server(s), to avoid any complications.

thank your for understanding,
marc

bobokonijn · March 14, 2012, 8:58am

Short synopsis: My client threads run in a cyclic manner. Each thread has its own Synapse TCP superchannel. If a RO method fails due to a broken comms connection, the thread destroys its RO interfaces and channel, then waits for a while, re-creates the RO channel and starts from scratch. I prefer this method over an implicit re-connect inside the Remobjects layer. This method works smoothly up to the RO “winter 2010” release, but it creates some kind of deadlock on later versions if many threads have their communication interrupted simultaneously. This situation involves many superchannels in many threads (hundreds) being destroyed simultaneously.

bobokonijn · April 5, 2012, 9:30am

OK, the root of the issue seems to be that ROSDK doesn’t like it if hundreds of threads destroy their channels simultaneously in case of a communications breakdown - something just “hangs” under those conditions. I avoid this now by relying on the channel’s auto-reconnect feature and haven’t had any lockups since then.

sergeyl · April 20, 2012, 2:25pm

Bravo Arthur!

I have just started constructing the test case and decided to reread the thread. Destroying channel instances massively is the thing that I would not do (not because it is considered bad practice or so, just world not do). Thanks to your report I can analyze closely what’s going on when destroying and fix if possible or document.

Best regards - Sergey.

sergeyl · April 23, 2012, 4:53pm

Hello Arthur,

I have created a test application that seems doing the same as yours, except for the web interface, it uses regular VCL instead. It creates 300 threads to execute simple remote requests periodically. The request time interval is randomized. Each thread is programmed to die in case of request failure destroying all the RO stuff it uses to connect to the server (each thread has own set of instances). I kill the server process after a while simulating the data fetch failure. Threads begin to destroy themselves quickly enough. The main form of the client application has a timer and a progress bar that spins by this timer. Smooth progress bar movement shows the main thread is not blocked.
So I could not reproduce the sluggish behavior. You could grab the attached project and play with it yourself if you still has interest in resolving this problem.
In addition I have analyzed the superchannel code. It tries to disconnect gracefully when being destroyed, it can wait some time and block the calling thread but I found nothing that could affect other threads of the application.

Best regards - Sergey.