[SERVER-69898] Wait for the critical section catch-up phase before refreshing the DB version Created: 22/Sep/22  Updated: 29/Oct/23  Resolved: 30/Sep/22

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 6.1.1, 5.0.14, 6.0.3, 6.2.0-rc0

Type: Bug Priority: Major - P3
Reporter: Antonio Fuschetto Assignee: Antonio Fuschetto
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v6.1, v6.0, v5.0
Sprint: Sharding EMEA 2022-10-03
Participants:

 Description   

Our framework for the critical sections allows to enter in catch-up phase (only write ops are blocked) and in a commit phase (both read and writes ops are blocked).

The getCriticalSectionSignal function accepts an argument, kWrite or kRead, to get the signal when a thread is entered in the catch-up or commit phases (kWrite) or only in the commit phase (kRead).

When the database version is needed to be refreshed, the current logic gets the critical section signal only when another thread is entered in the commit phase (which makes sense), BUT there is a problem. Suppose the following sequence of events:

  1. Thread A enters in the catch-up phase of the movePrimary
  2. Thread B refreshes the version of the same database, get the signal with kRead and the result is boost::none, so it continues
  3. Thread B enters in the commit phase and the ongoing refresh is not cancelled! In this case we have a race: movePrimary and version refresh run in parallel.

There are two solutions for this problem:

  1. Get the signal with kRead but we need to cancel ongoing refreshed when the commit phase is entered.
  2. Get the signal with kWrite and cancel ongoing refreshed when the catch-up phase is entered.

What about secondaries? Currently, secondary nodes are notified only when the primary enters in the catch-up phase. Consequently, we can only follow the solution 2 above, which is the target of this ticket.



 Comments   
Comment by Githook User [ 07/Oct/22 ]

Author:

{'name': 'Antonio Fuschetto', 'email': 'antonio.fuschetto@mongodb.com', 'username': 'afuschetto'}

Message: SERVER-69898 Wait for the critical section catch-up phase before refreshing the DB version
Branch: v6.0
https://github.com/mongodb/mongo/commit/29c8d88c97f9878f0d841adc61fcbf6758fca55f

Comment by Githook User [ 07/Oct/22 ]

Author:

{'name': 'Antonio Fuschetto', 'email': 'antonio.fuschetto@mongodb.com', 'username': 'afuschetto'}

Message: SERVER-69898 Wait for the critical section catch-up phase before refreshing the DB version
Branch: v6.1
https://github.com/mongodb/mongo/commit/8c8f7d1271c74da58b90f340436efdd8d4b603f3

Comment by Githook User [ 07/Oct/22 ]

Author:

{'name': 'Antonio Fuschetto', 'email': 'antonio.fuschetto@mongodb.com', 'username': 'afuschetto'}

Message: SERVER-69898 Wait for the critical section catch-up phase before refreshing the DB version
Branch: v5.0
https://github.com/mongodb/mongo/commit/83de4afb15a616fb8f40bf329ae17b65ae2c130a

Comment by Githook User [ 30/Sep/22 ]

Author:

{'name': 'Antonio Fuschetto', 'email': 'antonio.fuschetto@mongodb.com', 'username': 'afuschetto'}

Message: SERVER-69898 Wait for the critical section catch-up phase before refreshing the DB version
Branch: master
https://github.com/mongodb/mongo/commit/1723215716b31cdbcd9783673933426e1220afd1

Generated at Thu Feb 08 06:14:44 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.