[SERVER-40165] generate wtimeout deadlines with the precise clock Created: 15/Mar/19  Updated: 29/Oct/23  Resolved: 19/Apr/19

Status: Closed
Project: Core Server
Component/s: Internal Code
Affects Version/s: None
Fix Version/s: 3.4.21, 3.6.13

Type: Bug Priority: Major - P3
Reporter: Mira Carey Assignee: Mira Carey
Resolution: Fixed Votes: 1
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Related
is related to SERVER-40166 Force BG clock now == Date_t::lastNow Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v3.4
Sprint: Service Arch 2019-03-25, Service Arch 2019-04-08, Service Arch 2019-04-22
Participants:
Linked BF Score: 8

 Description   

In ReplicationCoordinatorImpl::_awaitReplication_inlock, we generate our deadline for waiting on wtimeout with the fast clock. Because we generate the deadline with the fast clock, but actually wait on it with the precise clock, it is possible for us to wtimeout early if there is significant drift between the precise and fast clocks.

This is unlikely to happen very often in practice, but could happen if the thread responsible for pushing ahead the fast clock get's descheduled for a substantial period of time. In that case, it is possible for the fast clock to drift multiple seconds behind the precise clock, and to immediately timeout in waitForConditionOrInterruptNoAssertUntil.



 Comments   
Comment by Githook User [ 19/Apr/19 ]

Author:

{'name': 'Jason Carey', 'username': 'hanumantmk', 'email': 'jcarey@argv.me'}

Message: SERVER-40165 set wtimeout deadlines with the precise clock

In ReplicationCoordinatorImpl::_awaitReplication_inlock, we generate our
deadline for waiting on wtimeout with the fast clock. Because we
generate the deadline with the fast clock, but actually wait on it with
the precise clock, it is possible for us to wtimeout early if there is
significant drift between the precise and fast clocks.

This is unlikely to happen very often in practice, but could happen if
the thread responsible for pushing ahead the fast clock get's
descheduled for a substantial period of time. In that case, it is
possible for the fast clock to drift multiple seconds behind the precise
clock, and to immediately timeout in
waitForConditionOrInterruptNoAssertUntil.

(cherry picked from commit 46086585d49f2da53d46c3f121e9f8dc23b699a1)
Branch: v3.4
https://github.com/mongodb/mongo/commit/c2da4180540e81ba591b858e14688d40f9455554

Comment by Githook User [ 19/Apr/19 ]

Author:

{'name': 'Jason Carey', 'username': 'hanumantmk', 'email': 'jcarey@argv.me'}

Message: SERVER-40165 set wtimeout deadlines with the precise clock

In ReplicationCoordinatorImpl::_awaitReplication_inlock, we generate our
deadline for waiting on wtimeout with the fast clock. Because we
generate the deadline with the fast clock, but actually wait on it with
the precise clock, it is possible for us to wtimeout early if there is
significant drift between the precise and fast clocks.

This is unlikely to happen very often in practice, but could happen if
the thread responsible for pushing ahead the fast clock get's
descheduled for a substantial period of time. In that case, it is
possible for the fast clock to drift multiple seconds behind the precise
clock, and to immediately timeout in
waitForConditionOrInterruptNoAssertUntil.
Branch: v3.6
https://github.com/mongodb/mongo/commit/46086585d49f2da53d46c3f121e9f8dc23b699a1

Comment by Mira Carey [ 21/Mar/19 ]

On reflection, after the changes in SERVER-40166, I'm going to avoid committing this on >3.6, preferring the more general change.

But in the interest of resolving bugs on 3.4 and 3.6, I'm still going to push forward with this change for those branches

Generated at Thu Feb 08 04:54:12 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.