[SERVER-62716] Handle spurious finishWaitingForOneOpTime in WaitForMajorityServiceTest Created: 18/Jan/22  Updated: 29/Oct/23  Resolved: 23/Feb/22

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 6.0.0-rc0, 5.0.10

Type: Bug Priority: Major - P3
Reporter: Vojislav Stojkovic Assignee: Matt Diener (Inactive)
Resolution: Fixed Votes: 0
Labels: neweng
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v5.3, v5.0
Sprint: Service Arch 2022-2-21, Service Arch 2022-03-07
Participants:
Linked BF Score: 18

 Description   

As determined in BF-22420, there is a possibility that WaitForMajorityService will call waitForWriteConcern on a request that has been cancelled but has not been removed from its request collection yet.

The service uses two background threads: one for processing requests (by waiting for replication) and the other for removing requests whose futures have been cancelled.

This behavior can manifest if the cancelled request is at the front of the queue and the processing thread wakes up and acquires the mutex after the cancellation thread has marked the request as processed but before it has removed it from the collection.

While the behavior itself is not a bug, it does cause problems in WaitForMajorityServiceTest where test logic assumes that the request has been handled only once. For example, CancelingEarlierOpTimeRequestDoesNotAffectLaterOpTimeRequests calls finishWaitingOneOpTime twice and assumes that the first request will have been processed after the first call and the second after the second call.

One way to fix this would be to call getLastOpTimeWaited after finishWaitingOneOpTime and retry when necessary.



 Comments   
Comment by Githook User [ 21/Jun/22 ]

Author:

{'name': 'Matt Diener', 'email': 'matt.diener@mongodb.com', 'username': 'mattdiener'}

Message: SERVER-62716 Ensure finishWaitingOneOpTime actually progresses
Branch: v5.0
https://github.com/mongodb/mongo/commit/a16e998ad6b2d08485b71a29949909e8c3b44643

Comment by Githook User [ 22/Feb/22 ]

Author:

{'name': 'Matt Diener', 'email': 'matt.diener@mongodb.com', 'username': 'mattdiener'}

Message: SERVER-62716 Ensure finishWaitingOneOpTime actually progresses
Branch: master
https://github.com/mongodb/mongo/commit/c1ce0d4518d9122230c0d0cf373cb488f34af021

Comment by Matthew Saltz (Inactive) [ 31/Jan/22 ]

After discussion in triage, it seems like this is a problem with a test rather than the service. The race described can happen in production, but it's not actually a problem since we just end up waiting for an opTime that we'd eventually surpass anyway. So we should just fix the unit test.

Generated at Thu Feb 08 05:55:52 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.