[SERVER-31440] Connpool HostTimeout races with callback lock acquire/release Created: 06/Oct/17  Updated: 30/Oct/23  Resolved: 07/Nov/17

Status: Closed
Project: Core Server
Component/s: Networking
Affects Version/s: None
Fix Version/s: 3.2.18, 3.4.11, 3.6.0-rc4

Type: Bug Priority: Major - P3
Reporter: Mira Carey Assignee: Mira Carey
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v3.4, v3.2
Steps To Reproduce:

Add a small timeout before any call to lock() in connection_pool.cpp

Sprint: Platforms 2017-10-23, Platforms 2017-11-13
Participants:
Linked BF Score: 0

 Description   

The executor connection pool host timeout is racy with respect to other code that's unlocked the parent mutex to allow for callback execution.

While effort was spent to protect against background threads with active requests and those participating in refresh, after those tasks have been executed we race with callbacks in how quickly they can return. When we lose that race, we destroy the specific pool out from under those callbacks.



 Comments   
Comment by Githook User [ 16/Nov/17 ]

Author:

{'name': 'Jason Carey', 'username': 'hanumantmk', 'email': 'jcarey@argv.me'}

Message: SERVER-31440 Fix Connpool HostTimeout race

The executor connection pool host timeout is racy with respect to other
code that's unlocked the parent mutex to allow for callback execution.

While effort was spent to protect against background threads with active
requests and those participating in refresh, after those tasks have been
executed we race with callbacks in how quickly they can return. When we
lose that race, we destroy the specific pool out from under those
callbacks.

Fix that by adding an ActiveClient wrapper that ensures a refcount on
the specific pool is increased for the lifetime of those calls.

(cherry picked from commit c3e174cab7b8e4a19772746942c7e68daa53bc5e)
Branch: v3.2
https://github.com/mongodb/mongo/commit/e8438a65d647004e94021e9b25087c7f1aac59f8

Comment by Githook User [ 16/Nov/17 ]

Author:

{'name': 'Jason Carey', 'username': 'hanumantmk', 'email': 'jcarey@argv.me'}

Message: SERVER-31440 Fix Connpool HostTimeout race

The executor connection pool host timeout is racy with respect to other
code that's unlocked the parent mutex to allow for callback execution.

While effort was spent to protect against background threads with active
requests and those participating in refresh, after those tasks have been
executed we race with callbacks in how quickly they can return. When we
lose that race, we destroy the specific pool out from under those
callbacks.

Fix that by adding an ActiveClient wrapper that ensures a refcount on
the specific pool is increased for the lifetime of those calls.

(cherry picked from commit c3e174cab7b8e4a19772746942c7e68daa53bc5e)
Branch: v3.4
https://github.com/mongodb/mongo/commit/e1d6d870c3b2436d4ef4e661e1bbf5500fa17ce8

Comment by Githook User [ 07/Nov/17 ]

Author:

{'name': 'Jason Carey', 'username': 'hanumantmk', 'email': 'jcarey@argv.me'}

Message: SERVER-31440 Fix Connpool HostTimeout race

The executor connection pool host timeout is racy with respect to other
code that's unlocked the parent mutex to allow for callback execution.

While effort was spent to protect against background threads with active
requests and those participating in refresh, after those tasks have been
executed we race with callbacks in how quickly they can return. When we
lose that race, we destroy the specific pool out from under those
callbacks.

Fix that by adding an ActiveClient wrapper that ensures a refcount on
the specific pool is increased for the lifetime of those calls.
Branch: master
https://github.com/mongodb/mongo/commit/c3e174cab7b8e4a19772746942c7e68daa53bc5e

Generated at Thu Feb 08 04:27:04 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.