TROSocketOpenSSL slow

Hi,

TROWinHTTPServer requires admin rights if case it isn’t configured with netsh add urlacl.

you can register your certificate with netsh add sslcert

OK, well I’ll try that later, but initial checks with the Indy Http server are promising. So far, everything looks to be working as expected, which means it must be the RO Socket which was causing the issue.

Hi,

I wonder why self-signed certificates work as expected and Let’s Encrypt certificates don’t.
We have plans to implement native support for Let’s Encrypt certificates in RO so this issue will be reviewed during implementation and testing of this feature

1 Like

That would be awesome, let me know when you want someone to test that :wink:
For now, I’ll be copying the files from our webserver to this server’s folder every 3 months, so I need to put something in place to automate that.

So far, all testing has shown it is working perfectly now, with Indy.

When testing this, don’t forget it worked fine with the mega demo and my issue only came about when people were testing on windows chrome and edge. All worked fine for me on apple devices, and localhost.

Hello,

I’ve found out that the behavior that JeremyK describes in the first post happens when SSL_read returns -1 and SSL_get_error returns 2 (SSL_ERROR_WANT_READ).
That means that the TROSocketOpenSSL.is_retryable returns true and in this state the loop in the TROSocketOpenSSL.Read can repeat for at least minutes (we’ve empirically seen 6 minutes but we think that it could became the endless loop). In those six minutes, we’ve seen it could do like millions of iterations, making the server unavailable and CPU usage high.

I am not sure how to handle this situation, but maybe some iteration counter or time-in-loop check could solve this.

Regards, Jaroslav

Hi,

is it reproduced with the latest build (.1561)?
if yes can you create a simple testcase that reproduces this case, pls?

you can drop it to support@ for keeping privacy

Hello,

unfortunately we are not able to reproduce this behavior. It happens on our production server sometimes once in three weeks and sometimes the next day after the restart.

The problem occurs in version 1555 with the applied fix of the TROSocketOpenSSL.Shutdown that you’ve posted on Feb 23. I’ve check the sources of 1561 and nothing that should resolve this problem seems to be there.

On Feb 23 JeremyK answered to your provided hotfix that it resolved the problem with the Shutdown, but that he still has the original problem - i am not sure that you’ve noticed that part of his answer.

We’ve slightly changed the implementation of the TROSocketOpenSSL.Read so that if the loop in it lasts longer than a minute, the application logs lone of that state, and every next minute it adds the new line and we have seen this:

23.06.2023 10:28:00: IterationCount = 65488344, Threadid: 16388, SSLReadResult = -1, SSLError = 2
23.06.2023 10:29:01: IterationCount = 131906974, Threadid: 16388, SSLReadResult = -1, SSLError = 2
23.06.2023 10:30:02: IterationCount = 197630734, Threadid: 16388, SSLReadResult = -1, SSLError = 2
23.06.2023 10:31:03: IterationCount = 262422917, Threadid: 16388, SSLReadResult = -1, SSLError = 2
23.06.2023 10:32:04: IterationCount = 329313844, Threadid: 16388, SSLReadResult = -1, SSLError = 2
29.06.2023 5:33:56: IterationCount = 50017305, Threadid: 16388, SSLReadResult = -1, SSLError = 2
29.06.2023 5:34:57: IterationCount = 100374515, Threadid: 16388, SSLReadResult = -1, SSLError = 2

Regards, Jaroslav

Hi,

according to /docs/man3.1/man3/SSL_read.html ,

this is SSL_ERROR_WANT_READ:

# define SSL_ERROR_WANT_READ             2

from /docs/man3.1/man3/SSL_get_error.html :

as for me, something is wrong happened and SSL engine decided to do more retries .

We can’t ignore SSL_ERROR_WANT_READ because this is standard error and SSL usually continue to work after this error.

I totally agree, that’s what I wrote in my first reaction. Combinantion of SSLReadResult -1 and SSL_ERROR_WANT_READ is common and usual state and you can’t ignore it. The problem is that the loop can last for a very long time and maybe forever and consume CPU, making the server unavailable, consuming CPU and just waiting for someone who kills it.

This problem is relatively common issue, I’ve found this thread for example

https://github.com/openssl/openssl/issues/10279

For information we use OpenSSL 3.0.7. I agree this is some kind of bug in OpenSSL, but maybe you could add some maximum-time-check to the loop.

Regards, Jaroslav

Hi,

I think, we can add some delay to this method (like sleep(10) or similar) - it will allow to reduce consuming CPU.

another issue - if we “kill” such connection, SSL may become unstable and will reject incoming connections in the future :frowning: I saw this case when connection wasn’t properly closed - server didn’t accept clients

Hello,

just for the information - we experienced the issue this morning - the loop on our production server has started in thursday last week (6th of July) and it continued until now (i.e. for 4 days). We had to kill the process via the process manager and restart it. So I think that the loop is practically infinite making the whole process unusable.

Please consider mechanism like Max-In-The-Loop-Time for example with using some global variable (max time in msec) that we could set in the initialization of the process without changing your code - it wouldn’t harm other users solution.

Regards, Jaroslav

Hi,

try to use this code:

  • uROSocket.pas
var g_maxSSLReadRetries: Integer = -1; //<---------- added
implementation
...
function TROSocketOpenSSL.Read(Buf: Pointer; Size: Integer): Integer;
var  l_loopcnt: Integer;  //<---------- added
..
    l_loopcnt := 0; //<---------- added
    repeat
..
      Inc(l_loopcnt); //<---------- added
      Result := SSL_read(fSSL, Buf, Size);
..
      if (g_maxSSLReadRetries <> -1) and (l_loopcnt > g_maxSSLReadRetries) then Break; //<---------- added
    until False;

note: according to your log, you have ~1*10^6 retries per sec so you can set desired time

pls inform, if this code improves situation with your server.

Hello,

but the counter depends on hardware, so if we change the machine, we would have to re-think the number. Is it OK to you to do the solution rather depending on time? Something like this?

// MaxSSLReadTime in msec
var g_maxSSLReadTime: Integer = -1; //<---------- added

 
function TROSocketOpenSSL.Read(Buf: Pointer; Size: Integer): Integer;
var  l_begtime: DateTime;  //<---------- added
..
    l_begtime := Now; //<---------- added
    repeat
..
      Result := SSL_read(fSSL, Buf, Size);
..
      if (g_maxSSLReadTime <> -1) and (MillisecondsBetween(l_begtime, Now) > g_maxSSLReadTime) then Break; //<---------- added
    until False;

Logged as bugs://D19376.

bugs://D19376 was closed as fixed.

1 Like