Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Fixed
Priority: Critical - P2
Fix Version/s: 4.0.25
Affects Version/s: 4.0.24
Component/s: None
Labels:
None

Backwards Compatibility:
Fully Compatible
Operating System:
ALL
Sprint:
Sharding 2021-05-03
Confidence Status:
None
Work Order:
0
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

PoolForHost::_maxInUse is init to <int>::max and cannot be changed, and when connections are maxed using the DBConnectionPool::_maxInUse DBConnectionPool::get() always throws, immediately.

To understand this bug first pay attention that there is PoolForHost::_maxInUse here and DBConnectionPool::_maxInUse here. They are not synchronized in any way.

The DBConnectionPool::_maxInUse can be set with public method setMaxInUse() here. The PoolForHost::_maxInUse is initialized in constructor to std::numeric_limits<int>::max() and there is no code path that changes it, it remains MAX_INT for every host for the lifetime of the pool.

The PoolForHost::_maxInUse is used only here, in waitForFreeConnection(). The logic is that waitForFreeConnection() is invoked from DBConnectionPool::Detail::get() here when the DBConnectionPool::_maxInUse is exhausted.

However, this code fragment in waitForFreeConnection() always throws:

    auto condition = [&] { return (numInUse() < _maxInUse || _inShutdown.load()); };

    if (timeout > 0) {
        stdx::chrono::seconds timeoutSeconds{timeout};

        // If we timed out waiting without getting a new connection, throw.
        uassert(ErrorCodes::ExceededTimeLimit,
                str::stream() << "too many connections to " << _hostName << ":" << timeout,
                !_cv.wait_for(lk, timeoutSeconds, condition));

because the 'condition' "numInUse() < _maxInUse" is always true because _maxInUse is MAX_INT. Thus the "_cv.wait_for()" never blocks because the condition is always true when it's invoked. However the assertion "!_cv.wait_for(...)" always throws because wait_for() always returns true and its negation is asserted.

As a result of this bug the behavior of DBConnectionPool is that once the connection count reaches the limit the get() always throws, immediately. The code actually expects that it should block for specified timeout, throwing without blocking should create very tight loops somewhere up the stack, if the caller logic retries whatever it needs the connection pool for.

The proposed fix actually contains 4 bugfixes in one change, CR pending

1. PoolForHost::_maxInUse is removed as there is a similar field in DBConnectionPool and it was never set properly

2. PoolKey::timeout is changed from double to int64. The reason is that it is dangerous to make a double key for a container, container lookup may become non-deterministic because of double precision ops errors

3. In PoolForHost::waitForFreeConnection(), "!_cv.wait_for(...)" is changed to "_cv.wait_for()". The inversion of the wait return value for assertion is a bug, in case of timeout wait() returns false and we should assert it without inversion

4. DBConnectionPool::decrementEgress() must notify waiters. This is ~~SERVER-56147~~, the CV is never notified if the connection is released using this method.

related to

SERVER-56229 Forward-port SERVER-56217 bug in connection pool to head

Backlog

Assignee:: Andrew Shuvalov (Inactive)
Reporter:: Andrew Shuvalov (Inactive)
Participants:: Andrew Shuvalov, Bruce Lucas, Githook User
Votes:: 0 Vote for this issue
Watchers:: 10 Start watching this issue

Created:: Apr 20 2021 11:05:30 PM UTC
Updated:: Oct 29 2023 09:54:44 PM UTC
Resolved:: Apr 23 2021 07:30:37 PM UTC

Details

Description

The proposed fix actually contains 4 bugfixes in one change, CR pending

Attachments

Issue Links

Forms

Activity

People

Dates