[SERVER-32284] awaitReplication can hang when the optime to wait for does not match the minSnapshot. Created: 12/Dec/17  Updated: 30/Oct/23  Resolved: 18/Jan/18

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: 3.7.2

Type: Bug Priority: Major - P3
Reporter: Daniel Gottlieb (Inactive) Assignee: Benety Goh
Resolution: Fixed Votes: 0
Labels: rollback-functional
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
depends on SERVER-30911 Apply timestamps for index build writes Closed
depends on SERVER-32206 Catalog change to declare an index as... Closed
depends on SERVER-32251 dropCollection/dropDatabase must be t... Closed
Related
related to SERVER-32624 dropDatabase() should wait for collec... Closed
is related to SERVER-30638 Change setReadFromMajorityCommittedSn... Closed
is related to SERVER-30793 merge setFeatureCompatibilityVersion ... Closed
is related to SERVER-19212 New indexes shouldn't be usable until... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Sprint: Repl 2018-01-01, Repl 2018-01-15, Repl 2018-01-29
Participants:
Linked BF Score: 0

 Description   

ReplicationCoordinatorImpl::_awaitReplication_inlock accepts waiting for an opTime and a minSnapshot. This method will register itself onto a waiter list for a condition notification and successfully return when _doneWaitingForReplication_inlock returns true.

In order for the predicate to return true, a valid snapshot must exist at the minSnapshot time.

However, the condition variable is notified when _doneWaitingForReplication_inlock succeeds with a trivially true minSnapshot value. Also note that notifying a waiter also removes it from the list waiters that are notified when optimes advance.

In this case, the predicate for _awaitReplication_inlock is stronger than to be notified, and because notification happens at most once, a client can hang waiting for a followup notification will never come.



 Comments   
Comment by Benety Goh [ 18/Jan/18 ]

This bug is fixed by removing the minSnapshot logic, which is no longer used, from awaitReplication.

Comment by Githook User [ 18/Jan/18 ]

Author:

{'name': 'Benety Goh', 'email': 'benety@mongodb.com', 'username': 'benety'}

Message: SERVER-32284 rename ReplicationCoordinator::reserveSnapshotName() to getMinimumVisibleSnapshot()
Branch: master
https://github.com/mongodb/mongo/commit/7713d5531c663603d17fff1267d013e0b6867e5b

Comment by Githook User [ 18/Jan/18 ]

Author:

{'name': 'Benety Goh', 'email': 'benety@mongodb.com', 'username': 'benety'}

Message: SERVER-32284 remove unused last snapshot support from ReplClientInfo
Branch: master
https://github.com/mongodb/mongo/commit/25b7af8b7367de11f0d4d864bd6a51983227c494

Comment by Benety Goh [ 17/Jan/18 ]

references to awaitReplicationOfLastOpForClient() were removed in SERVER-30638 and SERVER-30793

Comment by Githook User [ 17/Jan/18 ]

Author:

{'name': 'Benety Goh', 'email': 'benety@mongodb.com', 'username': 'benety'}

Message: SERVER-32284 ReplicationCoordinatorImpl::_doneWaitingForReplication_inlock() always assumes null minSnapshot
Branch: master
https://github.com/mongodb/mongo/commit/13a33d961f6936dc8290b8bb80f5c5b9e599f0a9

Comment by Githook User [ 17/Jan/18 ]

Author:

{'name': 'Benety Goh', 'email': 'benety@mongodb.com', 'username': 'benety'}

Message: SERVER-32284 ReplicationCoordinatorImpl::_awaitReplication_inlock() always assumes null minSnapshot
Branch: master
https://github.com/mongodb/mongo/commit/d071ff8278abcd05d63c0367c49284645e844bcc

Comment by Githook User [ 17/Jan/18 ]

Author:

{'name': 'Benety Goh', 'email': 'benety@mongodb.com', 'username': 'benety'}

Message: SERVER-32284 remove ReplicationCoordinator::awaitReplicationOfLastOpForClient()
Branch: master
https://github.com/mongodb/mongo/commit/0d97768115d093ed0041fff8c0ef39ba30c07e3f

Comment by Githook User [ 17/Jan/18 ]

Author:

{'name': 'Benety Goh', 'email': 'benety@mongodb.com', 'username': 'benety'}

Message: SERVER-32284 collMod waits for UUID schema changes using ReplicationCoordinator::awaitReplication() instead of awaitReplicationOfLastOpForClient()
Branch: master
https://github.com/mongodb/mongo/commit/7ed79c16f619cab2195edf9cad37a3c4765c8a23

Comment by Githook User [ 09/Jan/18 ]

Author:

{'name': 'Benety Goh', 'username': 'benety', 'email': 'benety@mongodb.com'}

Message: SERVER-32284 add minSnapshot to failed WC message in ReplicationCoordinator::awaitReplication()
Branch: master
https://github.com/mongodb/mongo/commit/63f957d5a91d47bf42d4a9f2e5d89d38599ec1da

Comment by Gregory McKeon (Inactive) [ 09/Jan/18 ]

benety.goh should this be assigned to you?

Generated at Thu Feb 08 04:29:47 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.