[CDRIVER-3808] SRV polling thread uses 100% CPU when connected to a replica set Created: 23/Oct/20 Updated: 28/Oct/23 Resolved: 02/Nov/20 |
|
| Status: | Closed |
| Project: | C Driver |
| Component/s: | None |
| Affects Version/s: | 1.17.0 |
| Fix Version/s: | 1.17.2 |
| Type: | Bug | Priority: | Critical - P2 |
| Reporter: | Ryan Landvater | Assignee: | Kevin Albertson |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
Ubuntu 18 / 20 and macOS |
||
| Attachments: |
|
| Description |
|
Implementation of the client pool has resulted in a runaway thread consuming 100% of CPU on a task (from what I can gather by tracing it is srv_polling_run within the libmongoc-1.0.0 dynamic library). It is reproducible on my macOS development machine and my docker 18 and 20 distribution containers. My use of the mongocxx:: pool class is standard with the documentation. |
| Comments |
| Comment by Githook User [ 02/Nov/20 ] |
|
Author: {'name': 'Kevin Albertson', 'email': 'kevin.albertson@mongodb.com', 'username': 'kevinAlbs'}Message: If an SRV URI is used to connect to a deployment other than a sharded cluster |
| Comment by Githook User [ 02/Nov/20 ] |
|
Author: {'name': 'Kevin Albertson', 'email': 'kevin.albertson@mongodb.com', 'username': 'kevinAlbs'}Message: If an SRV URI is used to connect to a deployment other than a sharded cluster |
| Comment by Kevin Albertson [ 23/Oct/20 ] |
| Comment by Kevin Albertson [ 23/Oct/20 ] |
|
Hi rylandva@med.umich.edu, thank you for the report! SRV polling is handled by libmongoc. I was able to reproduce this with a small application in the C driver. I did not observe this when connected to a sharded cluster. But I did observe high CPU usage in the SRV polling thread when connected to a replica set. Period SRV polling is skipped for replica sets, but the check that bypasses the SRV poll returns to the SRV polling thread early, which ends up repeatedly calling to attempt to rescan. I am moving this to the CDRIVER project. We will fix this very soon. |
| Comment by Ryan Landvater [ 23/Oct/20 ] |
|
I have upgraded to 3.6.0 and the problem still persists. For reference: My base class that wraps the driver is QBMongoDriver. It uses a "run" function to iterate through a list of transactions (mostly JSON objects) that have been deposited by threads controlled by boost::asio. When a transaction is added the conditional variable is released (for one thread; notify_one). Within QBMongoDriver::transaction, a client is acquired from the pool. class QBMongoDriver : public QBObject { *friend* *class* QBRootServer; *friend* *class* QBClientSession; // Wrapper around mongo-cxx basic client objects mongocxx::uri URI; mongocxx::instance inst; mongocxx::pool pool; // Queue and mutex for mongo_transactions boost::mutex _queueMTX; boost::condition_variable _queueCV; std::vector< boost::shared_ptr< QBMongoTransaction>> _queue; std::vector<boost::thread> _queueThreads; std::default_random_engine _generator; uint16_t _threadCount; *bool* _accepting = *true*; *public*: *explicit* QBMongoDriver(std::string& URL, uint16_t thread_count = 2, QBObject* parent = *nullptr*); *private*: *void* run(); }void QBMongoDriver::run() { while (_accepting) { boost::shared_ptr<QBMongoTransaction> transaction_; boost::unique_lock<boost::mutex> lock (_queueMTX);
// Iterate through items in the queue until it is empty while (!_queue.empty()) { // Pull the first transaction from the queue transaction_ = _queue.front(); _queue.erase(_queue.begin()); lock.unlock(); // PROCESS THE TRANSACTION transaction(transaction_); //Reinstate the lock lock.lock(); }
// if there are no outstanding requests, unlock and wait... _queueCV.wait_for(lock, boost::chrono::seconds(1)); } }
|