[SERVER-60343] Single-phase index builds perform un-timestamped catalog writes on abort Created: 30/Sep/21  Updated: 29/Oct/23  Resolved: 04/Oct/21

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 5.0.4, 5.1.0-rc0

Type: Bug Priority: Major - P3
Reporter: Gregory Wlodarek Assignee: Gregory Wlodarek
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v5.0
Sprint: Execution Team 2021-10-04, Execution Team 2021-10-18
Participants:
Linked BF Score: 139

 Description   

When applying oplog entries from a tenant migration, single-phase index builds that get aborted due to some external event, like a replication state transition, perform un-timestamped catalog writes.

Single-phase index builds timestamp the first catalog write to add the index entry here. This uses the onStartIndexBuildSinglePhase OpObserver to generate a timestamp by writing a no-op oplog entry. However, if the index build gets interrupted, it needs to remove the index entry from the catalog by performing another write. Today, this second catalog write is un-timestamped and triggers an invariant.

We'd probably have to do something similar that we do for the first catalog write, by writing a no-op oplog entry to generate a timestamp.



 Comments   
Comment by Githook User [ 15/Oct/21 ]

Author:

{'name': 'Gregory Wlodarek', 'email': 'gregory.wlodarek@mongodb.com', 'username': 'GWlodarek'}

Message: SERVER-60343 Single-phase index builds perform timestamped catalog writes on abort

(cherry picked from commit e24c4546b90a5af883dc111c0a60c6424ac395d2)
Branch: v5.0
https://github.com/mongodb/mongo/commit/541a6b6cba82590f2d8fddacc2c7327865f50bbb

Comment by Githook User [ 15/Oct/21 ]

Author:

{'name': 'Gregory Wlodarek', 'email': 'gregory.wlodarek@mongodb.com', 'username': 'GWlodarek'}

Message: SERVER-60343 Single-phase index builds perform timestamped catalog writes on abort

(cherry picked from commit 88acce3810557c5ca390df4906e10e04b6ff2a1f)
Branch: v5.0
https://github.com/10gen/mongo-enterprise-modules/commit/8e4c704f846b3bee26553afde25f17d293534b99

Comment by Vivian Ge (Inactive) [ 06/Oct/21 ]

Updating the fixversion since branching activities occurred yesterday. This ticket will be in rc0 when it’s been triggered. For more active release information, please keep an eye on #server-release. Thank you!

Comment by Githook User [ 05/Oct/21 ]

Author:

{'name': 'Gregory Wlodarek', 'email': 'gregory.wlodarek@mongodb.com', 'username': 'GWlodarek'}

Message: SERVER-60343 Single-phase index builds perform timestamped catalog writes on abort
Branch: fausto.leyva/SERVER-57826
https://github.com/10gen/mongo-enterprise-modules/commit/88acce3810557c5ca390df4906e10e04b6ff2a1f

Comment by Githook User [ 04/Oct/21 ]

Author:

{'name': 'Gregory Wlodarek', 'email': 'gregory.wlodarek@mongodb.com', 'username': 'GWlodarek'}

Message: SERVER-60343 Single-phase index builds perform timestamped catalog writes on abort
Branch: master
https://github.com/mongodb/mongo/commit/e24c4546b90a5af883dc111c0a60c6424ac395d2

Comment by Githook User [ 04/Oct/21 ]

Author:

{'name': 'Gregory Wlodarek', 'email': 'gregory.wlodarek@mongodb.com', 'username': 'GWlodarek'}

Message: SERVER-60343 Single-phase index builds perform timestamped catalog writes on abort
Branch: master
https://github.com/10gen/mongo-enterprise-modules/commit/88acce3810557c5ca390df4906e10e04b6ff2a1f

Comment by Githook User [ 01/Oct/21 ]

Author:

{'name': 'Gregory Wlodarek', 'email': 'gregory.wlodarek@mongodb.com', 'username': 'GWlodarek'}

Message: SERVER-60343 Single-phase index builds perform timestamped catalog writes on abort
Branch: SERVER-60343
https://github.com/10gen/mongo-enterprise-modules/commit/0f85fb4e9a5e735576cbfd5c7cf9896b39bb2ece

Comment by Gregory Wlodarek [ 30/Sep/21 ]

I've only seen this occur from the TenantMigrationWriter thread so far during a stepdown. So maybe there's some contract it's not following. I think we should just wrap the createIndex() call from the Oplog in a WUOW in that case. 

Comment by Louis Williams [ 30/Sep/21 ]

Interesting. This should be able to be done in a single WUOW and replicate just the legacy "createIndexes" oplog entry.

Comment by Gregory Wlodarek [ 30/Sep/21 ]

louis.williams that does not seem to be the case when running createIndexForApplyOps and creating the index here. So perhaps we just have to wrap the linked createIndex() call in a WriteUnitOfWork to make the whole procedure occur in a single WriteUnitOfWork.

But yes, this was on an empty collection that was locked exclusively.

I added some logging to see when we're opening and closing transactions and setting timestamps:

"conn1","msg":"+++ onStartIndexBuildSinglePhase"
"conn1","msg":"+++ _txnOpen"
"conn1","msg":"+++ setTimestamp","attr":{"ts":{"$timestamp":{"t":1633022069,"i":2}}}
"conn1","msg":"+++ setTimestamp","attr":{"ts":{"$timestamp":{"t":1633022069,"i":2}}}
"conn1","msg":"+++ logOperation"
"conn1","msg":"Index build: starting","attr":{"buildUUID":null,"collectionUUID":{"uuid":{"$uuid":"2422342a-4da1-490b-9591-92378414235a"}},"namespace":"test.repro","properties":{"v":2,"key":{"a":1},"name":"a_1","sparse":true},"specIndex":0,"numSpecs":1,"method":"Hybrid","maxTemporaryMemoryUsageMB":200}
"conn1","msg":"+++ _txnClose","attr":{"commit":true}
------------ We committed the previous timestamped transaction that does the first catalog write ------------
"conn1","msg":"+++ _txnOpen"
"conn1","msg":"Index build: collection scan done","attr":{"buildUUID":null,"collectionUUID":{"uuid":{"$uuid":"2422342a-4da1-490b-9591-92378414235a"}},"namespace":"test.repro","totalRecords":0,"readSource":"kNoTimestamp","durationMillis":0}
"conn1","msg":"+++ _txnClose","attr":{"commit":true}
------------ This is the new transaction which isn't timestamped when aborting ------------
"conn1","msg":"Invariant failure","attr":{"expr":"_isTimestamped","file":"src/mongo/db/storage/wiredtiger/wiredtiger_recovery_unit.cpp","line":487}
"conn1","msg":"\n\n***aborting after invariant() failure\n\n"

Comment by Louis Williams [ 30/Sep/21 ]

gregory.wlodarek single-phase builds should only be used on empty collections, hold exclusive locks, and occur in a single WUOW. Is that not the case? Is that not the case here?

Generated at Thu Feb 08 05:49:34 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.