Loading...

Type: Bug
Resolution: Unresolved
Priority: Minor - P4
Fix Version/s: None
Affects Version/s: None
Component/s: None
Labels:
None

Assigned Teams:

Server Programmability
Operating System:
ALL
Steps To Reproduce:
Hide

Run a mongo instance (single, replica, or sharded – does not seem to matter) on RHEL 8.

Configure a blockTimeMS failpoint to block for X milliseconds.

Let start be the current time from a monotonic clock.

Run a command.

Let end be the current time from a monotonic clock.

Check that end - start is greater than or equal to the requested block time.

Repeat for several iterations until the above check fails.
Show
Run a mongo instance (single, replica, or sharded – does not seem to matter) on RHEL 8. Configure a blockTimeMS failpoint to block for X milliseconds. Let start be the current time from a monotonic clock. Run a command. Let end be the current time from a monotonic clock. Check that end - start is greater than or equal to the requested block time. Repeat for several iterations until the above check fails.
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

The blockTimeMS failpoint does not seem to reliably block for the full duration. I suspect this is due to the fact that blockTimeMS is implemented using a non-monotonic clock.

Take the following C++ driver example:

static constexpr auto block_time = std::chrono::milliseconds(50);

db.run_command(make_document(
    kvp("configureFailPoint", "failCommand"),
    kvp("mode", make_document(kvp("times", 1))),
    kvp("data",
        make_document(
            kvp("failCommands", bsoncxx::builder::basic::make_array("ping")),
            kvp("blockTimeMS", std::chrono::milliseconds(block_time).count()),
            kvp("blockConnection", true)))));

auto const start = std::chrono::steady_clock::now();

db.run_command(make_document(kvp("ping", 1)));

auto const end = std::chrono::steady_clock::now();

auto const elapsed = end - start;

REQUIRE(elapsed >= block_time); // Fails!

On RHEL 8, the elapsed duration is consistently just shy (<1ms) of the block time.

Using the full example here, I was able to reproduce this on Evergreen with the following output:

[2026/02/11 10:21:57.682] ../src/mongocxx/test/v_noabi/block_time_ms_bug_repro.cpp:52: failed: to_raw_us(elapsed) >= to_raw_us(block_time) for: 49365.36299999999755528 >= 50000.0 with 1 message: 'iteration := 0'
[2026/02/11 10:21:57.732] ../src/mongocxx/test/v_noabi/block_time_ms_bug_repro.cpp:52: failed: to_raw_us(elapsed) >= to_raw_us(block_time) for: 49272.5 >= 50000.0 with 1 message: 'iteration := 1'
[2026/02/11 10:21:57.782] ../src/mongocxx/test/v_noabi/block_time_ms_bug_repro.cpp:52: failed: to_raw_us(elapsed) >= to_raw_us(block_time) for: 49332.54299999999784632 >= 50000.0 with 1 message: 'iteration := 2'
[2026/02/11 10:21:57.832] ../src/mongocxx/test/v_noabi/block_time_ms_bug_repro.cpp:52: failed: to_raw_us(elapsed) >= to_raw_us(block_time) for: 49312.42100000000209548 >= 50000.0 with 1 message: 'iteration := 3'

This pattern continues the majority of the 100 iterations. To be clear, the above duration values are in microseconds. Since the client-side code is measuring the duration using std::chrono::steady_clock, I feel confident that these duration measurements are accurate.

I suspected this issue was caused by the server basing the sleep duration off the system clock, which is subject to time jumps. After investigating, this appears to be the case:

blockTimeMS is implemented using OperationContext::sleepFor(): https://github.com/10gen/mongo/blob/9c8a5d399b1ebf13a6fe86ef62e18f02bcf1cf6f/src/mongo/db/commands.cpp#L767
sleepFor() calls waitForConditionOrInterruptFor(): https://github.com/10gen/mongo/blob/9c8a5d399b1ebf13a6fe86ef62e18f02bcf1cf6f/src/mongo/util/interruptible.h#L254
waitForConditionOrInterruptFor() calls getExpirationDateForWaitForValue(): https://github.com/10gen/mongo/blob/9c8a5d399b1ebf13a6fe86ef62e18f02bcf1cf6f/src/mongo/util/interruptible.h#L233
getExpirationDateForWaitForValue() declares the wake up time as getServiceContext()>getPreciseClockSource()>now() + waitFor: https://github.com/10gen/mongo/blob/9c8a5d399b1ebf13a6fe86ef62e18f02bcf1cf6f/src/mongo/db/operation_context.cpp#L575
Unless a non-null ClockSource is passed to the ServiceContext constructor, _preciseClockSource is initialized using SystemClockSource: https://github.com/10gen/mongo/blob/9c8a5d399b1ebf13a6fe86ef62e18f02bcf1cf6f/src/mongo/db/service_context.cpp#L560
SystemClockSource::now() returns Date_t::now(): https://github.com/10gen/mongo/blob/77e0b1ec702cc2cd0ba1381ae793387149ecffae/src/mongo/util/system_clock_source.cpp#L39
Descending into the call stack, eventually we reach curTimeDuration(), which computes the current time using gettimeofday(): https://github.com/10gen/mongo/blob/77e0b1ec702cc2cd0ba1381ae793387149ecffae/src/mongo/util/time_support.cpp#L599

From the gettimeofday documentation:

The time returned by gettimeofday() is affected by discontinuous jumps in the system time (e.g., if the system administrator manually changes the system time). If you need a monotonically increasing clock, see clock_gettime(2).

To correctly implement blockTimeMS, the server should be measuring the duration using a steady, monotonic clock. Otherwise, there are technically no guarantees that the block duration will be honored.

In practice, blockTimeMS seems to get close enough to the correct block duration most of the time, which is perhaps why this issue has gone undetected. Additionally, I could only ever reproduce this issue on RHEL 8 – I am not sure why.

is related to

SERVER-103039 Allow to use failCommand failpoint to hang comamnds before their execution.

Closed

Details

Description

Attachments

Issue Links

Activity

People

Dates