[SERVER-47883] Newly-elected primaries do not wait for single-phase background index builds to complete before accepting writes Created: 01/May/20  Updated: 29/Oct/23  Resolved: 02/Jul/20

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 4.0.0, 4.2.0
Fix Version/s: 4.2.9, 4.0.21

Type: Bug Priority: Major - P3
Reporter: Louis Williams Assignee: Louis Williams
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Related
is related to SERVER-51608 [4.0] backport implicitly_retry_on_ba... Closed
Backwards Compatibility: Minor Change
Operating System: ALL
Backport Requested:
v4.0
Sprint: Execution Team 2020-06-29, Execution Team 2020-07-13
Participants:
Linked BF Score: 17

 Description   

Definition of "single-phase background builds": Index builds started on versions 4.2 and prior; in 4.4, index builds started in FCV 4.2. 

Consider the scenario:

  • Node 1, Primary, starts and completes a single-phase background index build. It replicates a "createIndexes" oplog entry
  • Node 2, Secondary, starts the index build
  • Node 2 steps up as primary, but the index build is still incomplete

Say a client created an index and waited for it to complete on the primary before issuing read queries. After the state transition, the client will see that the index build is no longer available for queries until the new primary completes the index build. 

Proposed solution: on a state transition to primary, wait for all BackgroundOperations to complete like we already do for rollback.



 Comments   
Comment by Githook User [ 10/Sep/20 ]

Author:

{'name': 'Louis Williams', 'email': 'louis.williams@mongodb.com', 'username': 'louiswilliams'}

Message: SERVER-47883 Override stepdown suites to ensure background index builds are complete after stepdown

(cherry picked from commit 045cbbe721087ab7d36c5b0c99096103eb7a7d45)

Backports causally_consistent_index_builds.js
Branch: v4.0
https://github.com/mongodb/mongo/commit/6771000c858db2358a7ef49378fcfd92ea2b177b

Comment by Githook User [ 02/Jul/20 ]

Author:

{'name': 'Louis Williams', 'email': 'louis.williams@mongodb.com', 'username': 'louiswilliams'}

Message: SERVER-47883 Override stepdown suites to ensure background index builds are complete after stepdown
Branch: v4.2
https://github.com/mongodb/mongo/commit/045cbbe721087ab7d36c5b0c99096103eb7a7d45

Comment by Louis Williams [ 22/Jun/20 ]

This behavior has always existed for background index builds. It would be extremely risky to backport a change that blocks replication to older versions. Instead, we have decided to accept this bug and will only address the test failures.

I propose including the causally_consistent_index_builds.js override in step-down suites. This follows every "createIndexes" command with a "collMod". This creates a barrier so any subsequent commands will always see an index after the command completes.

Comment by Louis Williams [ 17/Jun/20 ]

I also believe this is more than a test issue. Since background index builds on secondaries relax index constraints, this may allow us to complete an index build on a primary in an inconsistent state.

Comment by Louis Williams [ 17/Jun/20 ]

I believe this bug was exposed by SERVER-39112, which removed a 1-second pause between primary drain mode and accepting writes. In our tests, index builds were always able to complete in the 1-second window before new writes were accepted. Removing that wait allowed our tests to start index builds on secondaries, transition immediately to primary, and observe an unfinished index build.

Generated at Thu Feb 08 05:15:30 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.