[CXX-1209] Client from pool throw timout exception Created: 29/Jan/17  Updated: 27/Oct/23  Resolved: 31/Jan/17

Status: Closed
Project: C++ Driver
Component/s: None
Affects Version/s: 3.1.1
Fix Version/s: None

Type: Task Priority: Major - P3
Reporter: Denis Bip Assignee: Unassigned
Resolution: Works as Designed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Linux, Debia 8, C Driver 1.5.3


Issue Links:
Related
Case:

 Description   

Hi again! Client from mongocxx::pool sometimes throws "Failed to send "insert" command with database "...": socket error or timeout: generic server error". Execution time this whole block code is 0ms, i.e. exception throws immediately...It seems that mongocxx::pool returns timeouted connection..?



 Comments   
Comment by David Golden [ 31/Jan/17 ]

TCP keepalive is below the application layer in the network stack, so the problem could be in a router/firewall/load-balancer. You'll need to talk to your system/network admins to diagnose further.

I'm going to close this, as I think you've identified the problem.

Comment by Denis Bip [ 30/Jan/17 ]

So, yes it is. The problem was in keepAliveTime tcp-sockets settings. Does it mean that client-driver side do not responce to server sended ACK packet? Or server may not send ACK packet but simple close iddle connection?

Comment by Denis Bip [ 30/Jan/17 ]

As an idea I think that problem may be with tunning short keepAlive time at the server machine. It is set at 30 sec to iddle, then each 10 sec send control packet to client 5 times

Comment by Denis Bip [ 30/Jan/17 ]

I dont think that its network broblems, because all other databases and services works very well (processed 500-2000 requests per second). I made log of succecced and failed inserts (bulk_write with 4-10 rows each), Fails begin after 5 minutes running
http://5.101.118.226:8884/test.html

Comment by David Golden [ 30/Jan/17 ]

How often is "sometimes"? There is an inherent race condition where just because the last operation with a client succeeded doesn't mean that the next one will, so failures of this sort are always possible – but without failovers or a flaky network, you shouldn't been seeing those regularly.

Comment by Denis Bip [ 29/Jan/17 ]

Servers versions are: primary - 3.2.9, secondaries (3 servers) and arbiters (3 servers) - 3.4.1. I'm updating servers to 3.4.1 version but not finalize it yet. In primary server log containts only connection/disconnection and long queries information, no errors.
Code looks like:
init at main start thread (lifetime of objects are whole running time of programm):

    mongocxx::instance inst{};
    mongocxx::pool pool(mongocxx::uri { CONNECTION_STRING } );

multithread access (~30 threads):

    timer tm_; //execution timer
    mongocxx::stdx::optional<entry> entry_opt_ = pool.try_acquire(); //try to get connection from the pool
    if (entry_opt_ != mongocxx::stdx::nullopt)
    {
        mongocxx::client* client = entry_opt_->get();
        try
        {
            //do work with client, insert for example
        }
        catch(std::exception& ex) 
       {
           //here I catch timeout error, tm_ shows 0ms, i.e. current connection did not wait any timeout but it was already been (I think so)
       }
    }
    //here entry must be destroed and return release pool connection

Replicaset settings:

     "settings" : {
                "chainingAllowed" : true,
                "heartbeatIntervalMillis" : 2000,
                "heartbeatTimeoutSecs" : 10,
                "electionTimeoutMillis" : 10000,
                "getLastErrorModes" : {
 
                },
                "getLastErrorDefaults" : {
                        "w" : 1,
                        "wtimeout" : 0
                },
                "replicaSetId" : ObjectId("58185ff0e475ce694998d49c")
        }

Comment by David Golden [ 29/Jan/17 ]

Hello! For this, we really do need an SSCCE that lets us reproduce it. Please also note what server versions you have and give us a sanitized version of your replica set configuration.

Generated at Wed Feb 07 22:01:46 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.