[CDRIVER-3808] SRV polling thread uses 100% CPU when connected to a replica set Created: 23/Oct/20  Updated: 28/Oct/23  Resolved: 02/Nov/20

Status: Closed
Project: C Driver
Component/s: None
Affects Version/s: 1.17.0
Fix Version/s: 1.17.2

Type: Bug Priority: Critical - P2
Reporter: Ryan Landvater Assignee: Kevin Albertson
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Ubuntu 18 / 20 and macOS


Attachments: PNG File image-2020-10-22-23-43-06-010.png    

 Description   

Implementation of the client pool has resulted in a runaway thread consuming 100% of CPU on a task (from what I can gather by tracing it is srv_polling_run within the libmongoc-1.0.0 dynamic library). It is reproducible on my macOS development machine and my docker 18 and 20 distribution containers. 

My use of the mongocxx:: pool class is standard with the documentation. 



 Comments   
Comment by Githook User [ 02/Nov/20 ]

Author:

{'name': 'Kevin Albertson', 'email': 'kevin.albertson@mongodb.com', 'username': 'kevinAlbs'}

Message: CDRIVER-3808 fix srv polling thread from spinning (#692)

If an SRV URI is used to connect to a deployment other than a sharded cluster
the SRV polling thread spins since it bypasses the poll due to the topology
type being ineligible. This change terminates the thread when the topology
type is discovered to be ineligible.
Branch: r1.17
https://github.com/mongodb/mongo-c-driver/commit/2946e344e8c67c31b2b632259198eaa61941f844

Comment by Githook User [ 02/Nov/20 ]

Author:

{'name': 'Kevin Albertson', 'email': 'kevin.albertson@mongodb.com', 'username': 'kevinAlbs'}

Message: CDRIVER-3808 fix srv polling thread from spinning (#692)

If an SRV URI is used to connect to a deployment other than a sharded cluster
the SRV polling thread spins since it bypasses the poll due to the topology
type being ineligible. This change terminates the thread when the topology
type is discovered to be ineligible.
Branch: master
https://github.com/mongodb/mongo-c-driver/commit/ae50db1422567058902d18ef3e12af1b6bf89d67

Comment by Kevin Albertson [ 23/Oct/20 ]

PR: https://github.com/mongodb/mongo-c-driver/pull/692

Comment by Kevin Albertson [ 23/Oct/20 ]

Hi rylandva@med.umich.edu, thank you for the report!

SRV polling is handled by libmongoc. I was able to reproduce this with a small application in the C driver.

I did not observe this when connected to a sharded cluster. But I did observe high CPU usage in the SRV polling thread when connected to a replica set. Period SRV polling is skipped for replica sets, but the check that bypasses the SRV poll returns to the SRV polling thread early, which ends up repeatedly calling to attempt to rescan.

I am moving this to the CDRIVER project. We will fix this very soon.

Comment by Ryan Landvater [ 23/Oct/20 ]

I have upgraded to 3.6.0 and the problem still persists. For reference:

My base class that wraps the driver is QBMongoDriver. It uses a "run" function to iterate through a list of transactions (mostly JSON objects) that have been deposited by threads controlled by boost::asio. When a transaction is added the conditional variable is released (for one thread; notify_one). Within QBMongoDriver::transaction, a client is acquired from the pool.

class QBMongoDriver : public QBObject

{     *friend* *class* QBRootServer;     *friend* *class* QBClientSession;          // Wrapper around mongo-cxx basic client objects          mongocxx::uri       URI;     mongocxx::instance  inst;     mongocxx::pool      pool;          // Queue and mutex for mongo_transactions     boost::mutex                    _queueMTX;     boost::condition_variable       _queueCV;     std::vector<     boost::shared_ptr<     QBMongoTransaction>>            _queue;     std::vector<boost::thread>      _queueThreads;     std::default_random_engine      _generator;     uint16_t                        _threadCount;          *bool* _accepting = *true*;      *public*:     *explicit* QBMongoDriver(std::string& URL,                            uint16_t thread_count = 2,                            QBObject* parent = *nullptr*);   *private*:     *void* run(); }

void QBMongoDriver::run() {

    while (_accepting) {

        boost::shared_ptr<QBMongoTransaction> transaction_;

        boost::unique_lock<boost::mutex> lock (_queueMTX);

        

        // Iterate through items in the queue until it is empty

        while (!_queue.empty())

{                          // Pull the first transaction from the queue             transaction_ = _queue.front();             _queue.erase(_queue.begin());             lock.unlock();                          // PROCESS THE TRANSACTION             transaction(transaction_);                          //Reinstate the lock             lock.lock();         }

        

        // if there are no outstanding requests, unlock and wait...

        _queueCV.wait_for(lock, boost::chrono::seconds(1));

    }

}

 

Generated at Wed Feb 07 21:19:06 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.