[CDRIVER-3965] Version 1.17.5 adds a 4 seconds delay on each query Created: 26/Apr/21  Updated: 27/Oct/23  Resolved: 07/May/21

Status: Closed
Project: C Driver
Component/s: libmongoc
Affects Version/s: 1.17.5
Fix Version/s: None

Type: Bug Priority: Unknown
Reporter: Olivier Bertrand Assignee: Kevin Albertson
Resolution: Works as Designed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Windows 10 64 bits


Issue Links:
Related
is related to CDRIVER-3639 Connection failure may take 1-2s when... Backlog

 Description   

Since I switch to version the MongoDB C driver 1.17.5 (I was using 1.9) all use of this driver, that used to run in a few milliseconds, now takes more than 4 seconds to achieve.



 Comments   
Comment by Kevin Albertson [ 07/May/21 ]

Great, I am glad to hear bertrandop@free.fr!

Comment by Olivier Bertrand [ 07/May/21 ]

Yes, I did it and there is no more delay, thanks.

Comment by Kevin Albertson [ 07/May/21 ]

Hi bertrandop@free.fr,

Thank you for the additional information.

BTW, adding the option "tlsDisableOCSPEndpointCheck=true" to the URI make all the queries fail:

My apologies, I intended to ask for "tlsDisableCertificateRevocationCheck=true". If it is easy to check, trying with this option will help rule out whether OCSP is involved in the additional delay. Though given that you are observing a .5 second delay now, I suspect OCSP is unrelated. For reference, the URI options related to TLS and OCSP are documented in Configuring TLS.

Starting the server by "mongod --ipv6" does not remove the delay but reduces it from 4.50 sec to 0.50 sec. So it seems that there are 2 delays, a 4 sec one for ipv6 and a 0.50 sec one for another cause.

That is good to know. I agree the 4 second delay seems caused by CDRIVER-3639.

I have another hypothesis for the observed .5 second delay. Is your application creating and destroying a client pool (mongoc_client_pool_t) on each request that is handling a translation from SQL? Based on the snippet that you provided, that may be the case.

If so, this may be related to changes in multi-threaded monitoring (CDRIVER-3535). A call to mongoc_client_pool_destroy may experience a delay of 500ms when connected to one or more 4.4+ MongoDB servers during the shutdown process of long polling.

The Connection Pooling has additional details describing the behavior differences of pooled and single-threaded modes. But typical use of a mongoc_client_pool_t is to keep the pool alive for the duration of a long lived process. The discovered state of MongoDB servers can be reused on each pooled client and monitoring can continue in background threads without blocking operations.

However, if your application is handling short lived requests and creating and destroying a mongoc_client_pool_t on each request, it may be preferable to use a single-threaded mongoc_client_t. That would result in less resources used (no background monitoring threads) and may avoid the .5 second delay on shutdown.

The .5 second delay in pooled monitoring has been an issue for drivers wrapping the C driver testing: CDRIVER-3801, but if this is the cause, and your application requires creating and destroying a client pool on each request, we can investigate improving this behavior.

As a workaround, would it be possible to try replacing the mongoc_client_pool_t with a single-threaded mongoc_client_t?

Comment by Olivier Bertrand [ 29/Apr/21 ]

Starting the server by "mongod --ipv6" does not remove the delay but reduces it from 4.50 sec to 0.50 sec. So it seems that there are 2 delays, a 4 sec one for ipv6 and a 0.50 sec one for another cause. Of course, none of them for older C drivers (last used 1.9.2)

BTW, adding the option "tlsDisableOCSPEndpointCheck=true" to the URI make all the queries fail:

 2021/04/29 12:49:59.0939: [16032]: ERROR: stream-tls-secure-channel: Setting tlsDisableOCSPEndpointCheck has no effect when built against Secure Channel
 2021/04/29 12:49:59.0965: [16032]: ERROR: stream-secure-channel: failed to receive handshake, SSL/TLS connection failed
 2021/04/29 12:49:59.0965: [16032]: DEBUG: monitor: [localhost:27017] command or network error occurred: TLS handshake failed

 

Comment by Olivier Bertrand [ 29/Apr/21 ]

I am doing my testing using a local MongoDB server started by executing "mongod" with no parameters. The URI used by my program is simply "mongodb://localhost:27017".

It was Ok with older releases of the C driver. Could you elaborate and show what I should try?

Comment by Olivier Bertrand [ 28/Apr/21 ]

Indeed, only the first one has a delay. Subsequent operations have no delay.

Comment by Kevin Albertson [ 28/Apr/21 ]

Thank you for the additional information bertrandop@free.fr. I have two theories:

1. This could be related to failed connection attempts on hostnames resolving to multiple IPs. CDRIVER-3639 describes a 1-2 second delay observed on Windows when the hostname resolved to both an IPv6 and IPv4 address, but the server was only listening on IPv4. To determine if that is the cause, can you try specifying an IP literal in the URI to connect to and see if that resolves the delay?

2. This could be related to failed OCSP validation of server certificates if the client is taking a long time to reach an OCSP server. CDRIVER-3788 describes timeouts causing delays on each first connection. Can you try including tlsDisableOCSPEndpointCheck=true to bypass OCSP checks to see if that resolves the delay?

and the delay occurs on the first execution of mongoc_cursor_next.

To check, do operations after the first one still have a delay? I would imagine if CDRIVER-3639 or CDRIVER-3788 is the cause, subsequent operations after a connection is established would no longer have a delay. In practice, this would be the first operation on a server per mongoc_client_t.

Comment by Olivier Bertrand [ 26/Apr/21 ]

On another query where I do not execute the mongo collection count function, I continue by:

Cursor = mongoc_collection_find_with_opts(Collection, Query, NULL, NULL);
if (mongoc_cursor_next(Cursor, &Document))
...... 

and the delay occurs on the first execution of  mongoc_cursor_next.

Comment by Olivier Bertrand [ 26/Apr/21 ]

By the way, I had in the past installed versions 1.6.2, 1.6.3, 1.7.0 and 1.9.2 from tarball with no problem.

But when downloading version 1.17.5, after extracting and trying to follow what it described in the site explaining how to install, all try to generate the visual studio with cmake failed. Unfortunately I didn't keep a log of all these.

Finally, I cloned it from the git site and then the cmake process and compilation by Visual Studio 2019 succeeded. The 4 second delay I am describing occurs as well with the Debug, Release or Relwithdebinfo versions.

Comment by Olivier Bertrand [ 26/Apr/21 ]

I'll try to do that but in fact I'm using this driver in a rather big application, the MariaDB Connect storage engine, in which I translate SQL queries in call to MongoDB through the C driver.
When I say "all use of the driver" his means whether they are Select or CURL queries.

For instance, reading the inventory collection, I execute in that order:

  mongo_init(true);
 
  Uri = mongoc_uri_new_with_error(Pcg->Uristr, &Error);
 
  // Create a new client pool instance
  Pool = mongoc_client_pool_new(Uri);
  mongoc_client_pool_set_error_api(Pool, 2);
 
  // Register the application name so we can track it in the profile logs
  // on the server. This can also be done from the URI.
  mongoc_client_pool_set_appname(Pool, "Connect");
 
  // Create a new client instance
  Client = mongoc_client_pool_pop(Pool);
 
  Collection = mongoc_client_get_collection(Client, Pcg->Db_name, Pcg->Coll_name);
 
  query = bson_new();
 
#if defined(NDEBUG)
  cnt = (int)mongoc_collection_count(Collection,
    MONGOC_QUERY_NONE, query, 0, 0, NULL, &Error);
#else
  if (jf)
    cnt = (int)mongoc_collection_count_documents(Collection,
      query, NULL, NULL, NULL, &Error);
  else
    cnt = (int)mongoc_collection_estimated_document_count(
           Collection, NULL, NULL, NULL, &Error);
#endif
 
.....

(The #if, #else, #endif are whether I use and old MongoDB version or the latest one)

All is normal until I execute one of mongoc_collection_count function that takes more to 4/5 seconds to execute with the latest C Driver version when it takes almost no time with the old version.

If I don't execute it, this 4 second delay occurs later but does occur anyway.

Comment by Jeremy Mikola [ 26/Apr/21 ]

Can you clarify exactly what you mean by "all use of this driver"?

  • The first call to mongoc_init
  • Creating a client or client pool from a URI
  • The first operation that interacts with the database (e.g. executing a query), which would be the point when server discovery occurs
  • Each and every operation that interacts with the database

Some additional questions:

  • Can you share your connection string (redacting sensitive info) and URI options?
  • What server version and topology (e.g. sharded cluster, replica set) are you connecting to?
  • Can you share a minimal application that reproduces the issue?
  • Can you share trace logs (see below) covering the delay?

Capturing trace logs for a small application that reproduces the delay would also be helpful, as that will provide timestamps for various internal functions and should narrow down where this 4-second delay occurs. Trace logs may include sensitive information (e.g. byte dumps of socket IO), so you should be sure to redact anything sensitive within before sharing.

Generated at Wed Feb 07 21:19:32 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.