[SERVER-46603] disallow empty collection index build optimization on secondaries Created: 04/Mar/20  Updated: 29/Oct/23  Resolved: 11/Mar/20

Status: Closed
Project: Core Server
Component/s: Storage
Affects Version/s: None
Fix Version/s: 4.4.0-rc0, 4.7.0

Type: Bug Priority: Major - P3
Reporter: Suganthi Mani Assignee: Benety Goh
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Duplicate
is duplicated by SERVER-46656 Secondary and primary can disagree on... Closed
Related
related to SERVER-45201 Implicit collection creation from cre... Closed
is related to SERVER-46814 Invariant on "Commit quorum is missin... Closed
is related to SERVER-21700 Do not relax constraints during stead... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v4.4
Sprint: Execution Team 2020-03-23
Participants:
Linked BF Score: 43

 Description   

summary: shouldBuildIndexesOnEmptyCollectionSinglePhased() is racy and can make index builds to hang forever and can block DDL operations.

The empty collection check is racy.

[Acquires collection lock in X mode.
Register index build.
Performs collection empty check by opening cursor and see if record exists or not.
Releases the collection lock.]

======>>>>Documents can get deleted here

[Index build thread pool scheduling
Collection scan phase
Drain phase
Commit phase]



 Comments   
Comment by Githook User [ 12/Mar/20 ]

Author:

{'username': 'benety', 'name': 'Benety Goh', 'email': 'benety@mongodb.com'}

Message: SERVER-46603 disallow empty collection index build optimization on secondaries

(cherry picked from commit a6bdcd9f7f1264f5161720bb174a0c81396e412c)
Branch: v4.4
https://github.com/mongodb/mongo/commit/fe38b2fde02cc113d0b2de8e7d8b998dbcfce0b8

Comment by Githook User [ 11/Mar/20 ]

Author:

{'username': 'benety', 'name': 'Benety Goh', 'email': 'benety@mongodb.com'}

Message: SERVER-46603 disallow empty collection index build optimization on secondaries
Branch: master
https://github.com/mongodb/mongo/commit/a6bdcd9f7f1264f5161720bb174a0c81396e412c

Comment by Suganthi Mani [ 04/Mar/20 ]

The racy check has manifested in different ways, like in BF-16446 (see this for detailed analysis) and BF-16220. This can also lead to the original problem like in SERVER-45201. Consider the below event sequence for a 2 node replica set, assume your thread pool size is 1.

1) Primary registers index build for index 'x_1' on ns foor.bar.
2) Primary sees the collection is not empty and decides to use index builds coordinator thread pool.
3) Now, the documents in the collections gets deleted on primary and collection becomes empty.
4) Delete writes are replicated to secondary.
5) Then, primary, schedules the index build on the thread pool and initializes it. This step generates the startIndexBuild oplog entry.
6) Secondary receives the startIndexBuild oplog entry and commits the index build as the index created on that collection is empty.
7) Current primary steps down and then the secondary steps up.
8) Two phase index build for 'x_1' on old primary can survive state transition and runs continuing background.
9) New primary tries to create another index on 'y_1' for non-empty collection. And, so, it uses the thread pool and does index building. This steps generates the startIndexBuild oplog entry.
10) When the old primary receives the startIndexBuild oplog entry. The oplog applier applier gets stuck waiting for the index build coordinator thread to get freed up. But, the index build coordinator thread will keep waiting for the commit or abort oplog entry from the new primary for index 'x_1'. And, the index build 'y_1' on new primary gets stuck waiting for votes from secondary

Generated at Thu Feb 08 05:11:56 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.