[SERVER-60343] Single-phase index builds perform un-timestamped catalog writes on abort Created: 30/Sep/21 Updated: 29/Oct/23 Resolved: 04/Oct/21 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | 5.0.4, 5.1.0-rc0 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Gregory Wlodarek | Assignee: | Gregory Wlodarek |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||
| Operating System: | ALL | ||||||||
| Backport Requested: |
v5.0
|
||||||||
| Sprint: | Execution Team 2021-10-04, Execution Team 2021-10-18 | ||||||||
| Participants: | |||||||||
| Linked BF Score: | 139 | ||||||||
| Description |
|
When applying oplog entries from a tenant migration, single-phase index builds that get aborted due to some external event, like a replication state transition, perform un-timestamped catalog writes. Single-phase index builds timestamp the first catalog write to add the index entry here. This uses the onStartIndexBuildSinglePhase OpObserver to generate a timestamp by writing a no-op oplog entry. However, if the index build gets interrupted, it needs to remove the index entry from the catalog by performing another write. Today, this second catalog write is un-timestamped and triggers an invariant. We'd probably have to do something similar that we do for the first catalog write, by writing a no-op oplog entry to generate a timestamp. |
| Comments |
| Comment by Githook User [ 15/Oct/21 ] | ||||||||||||||
|
Author: {'name': 'Gregory Wlodarek', 'email': 'gregory.wlodarek@mongodb.com', 'username': 'GWlodarek'}Message: (cherry picked from commit e24c4546b90a5af883dc111c0a60c6424ac395d2) | ||||||||||||||
| Comment by Githook User [ 15/Oct/21 ] | ||||||||||||||
|
Author: {'name': 'Gregory Wlodarek', 'email': 'gregory.wlodarek@mongodb.com', 'username': 'GWlodarek'}Message: (cherry picked from commit 88acce3810557c5ca390df4906e10e04b6ff2a1f) | ||||||||||||||
| Comment by Vivian Ge (Inactive) [ 06/Oct/21 ] | ||||||||||||||
|
Updating the fixversion since branching activities occurred yesterday. This ticket will be in rc0 when it’s been triggered. For more active release information, please keep an eye on #server-release. Thank you! | ||||||||||||||
| Comment by Githook User [ 05/Oct/21 ] | ||||||||||||||
|
Author: {'name': 'Gregory Wlodarek', 'email': 'gregory.wlodarek@mongodb.com', 'username': 'GWlodarek'}Message: | ||||||||||||||
| Comment by Githook User [ 04/Oct/21 ] | ||||||||||||||
|
Author: {'name': 'Gregory Wlodarek', 'email': 'gregory.wlodarek@mongodb.com', 'username': 'GWlodarek'}Message: | ||||||||||||||
| Comment by Githook User [ 04/Oct/21 ] | ||||||||||||||
|
Author: {'name': 'Gregory Wlodarek', 'email': 'gregory.wlodarek@mongodb.com', 'username': 'GWlodarek'}Message: | ||||||||||||||
| Comment by Githook User [ 01/Oct/21 ] | ||||||||||||||
|
Author: {'name': 'Gregory Wlodarek', 'email': 'gregory.wlodarek@mongodb.com', 'username': 'GWlodarek'}Message: | ||||||||||||||
| Comment by Gregory Wlodarek [ 30/Sep/21 ] | ||||||||||||||
|
I've only seen this occur from the TenantMigrationWriter thread so far during a stepdown. So maybe there's some contract it's not following. I think we should just wrap the createIndex() call from the Oplog in a WUOW in that case. | ||||||||||||||
| Comment by Louis Williams [ 30/Sep/21 ] | ||||||||||||||
|
Interesting. This should be able to be done in a single WUOW and replicate just the legacy "createIndexes" oplog entry. | ||||||||||||||
| Comment by Gregory Wlodarek [ 30/Sep/21 ] | ||||||||||||||
|
louis.williams that does not seem to be the case when running createIndexForApplyOps and creating the index here. So perhaps we just have to wrap the linked createIndex() call in a WriteUnitOfWork to make the whole procedure occur in a single WriteUnitOfWork. But yes, this was on an empty collection that was locked exclusively. I added some logging to see when we're opening and closing transactions and setting timestamps:
| ||||||||||||||
| Comment by Louis Williams [ 30/Sep/21 ] | ||||||||||||||
|
gregory.wlodarek single-phase builds should only be used on empty collections, hold exclusive locks, and occur in a single WUOW. Is that not the case? Is that not the case here? |