[SERVER-38588] Hybrid index builds do not work when applied concurrently with prepared transactions on secondaries Created: 12/Dec/18  Updated: 29/Oct/23  Resolved: 08/Mar/19

Status: Closed
Project: Core Server
Component/s: Index Maintenance, Storage
Affects Version/s: None
Fix Version/s: 4.1.9

Type: Task Priority: Major - P3
Reporter: Randolph Tan Assignee: Eric Milkie
Resolution: Fixed Votes: 0
Labels: open_todo_in_code
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Documented
is documented by DOCS-12538 Docs for SERVER-38588: Hybrid index b... Closed
Problem/Incident
is caused by SERVER-37199 Yield locks of transactions in second... Closed
Related
related to SERVER-37336 Test that background index build do n... Closed
related to SERVER-39372 Make secondary lock acquisition for D... Closed
related to SERVER-40723 Deadlock between S lock acquisition o... Closed
related to SERVER-38540 Unblacklist multi index tests in mult... Closed
related to SERVER-40041 block prepared transactions behind in... Closed
related to SERVER-43638 Do not block prepared transactions on... Closed
is related to SERVER-38550 Mobile storage engine should support ... Closed
Backwards Compatibility: Fully Compatible
Sprint: Storage NYC 2019-03-11
Participants:
Case:
Linked BF Score: 22

 Description   

Original text:

Test setup: 

  • Multi shard 
  • collection is sharded with _id: hashed 
     
    The evergreen validateCollection hook fails semi-frequently after running the background_index_multikey.js test. Based on initial investigation, it looks like one of the indexes in the secondary is missing a key entry. The primary appears to have passed the validation. Note: test is currently blacklisted.

Because transactions yield their locks on secondaries (SERVER-37199), a concurrent hybrid background index build can conflict in a way that leads to lost writes into building indexes (i.e. corruption) on secondaries.

In this example, a background index build on {a: 1} is concurrent with an insert of a document {a: 0} in a transaction while applied on a secondary.

  • The background index build converts its X lock to an IX lock while collection scanning. It creates a temporary side-writes table to accept all index key insertions during the build.
  • A document {a: 0}is inserted in a transaction and prepared. The key for a: 0 is inserted into the side-writes table as part of the same transaction. When applied on a secondary, it drops its IX locks.
  • The background index build takes an X lock, uncontested, and drains the side-writes table. Because the insert into the side-writes table was part of a prepared, but uncommitted transaction, it is invisible to the index builder. The table is then dropped on completion.
  • The transaction is finally committed, but its side-write is committed to a now-deleted table. The inserted key is now lost forever and the resulting index is corrupted.

On a primary, our locks prevent this from happening, but because an index build can complete while a prepared transaction is active, we can lose writes into building indexes.

edit: louis.williams



 Comments   
Comment by Githook User [ 08/Mar/19 ]

Author:

{'name': 'Eric Milkie', 'email': 'milkie@10gen.com', 'username': 'milkie'}

Message: SERVER-38588 block application of prepare oplog entry on secondaries when a concurrent background index build is running

This will prevent hybrid index builds from corrupting an index on secondary nodes if a prepared transaction becomes prepared during a build but commits after the index build commits.
Branch: master
https://github.com/mongodb/mongo/commit/77742598f84ab1137514ae13824f7afa2c1e9804

Comment by Eric Milkie [ 28/Feb/19 ]

Note that this work will in effect stall replication when a prepared transaction is encountered that conflicts with a background index build, until such build completes.

Comment by Eric Milkie [ 28/Feb/19 ]

I removed this ticket from the Simultaneous Index Builds epic, as we came up with a different solution that will remove the dependency on that project's completion.

The work for this ticket will be the following:
We should augment _applyPrepareTransactionOplogEntry() to do the following:
1. Extract all the unique namespace names from the ops in the applyOps command bson.
2. Call BackgroundOperation::inProgForNs() for each namespace in turn. If any call returns true, call awaitNoBgOpInProgForNs() for that namespace.

Comment by Eric Milkie [ 04/Jan/19 ]

Moved this ticket to the Simultaneous Index Builds epic, as that project will be the work to fix this problem completely. The hybrid project has provided the interface for Simul. to call at the correct times.

Comment by Randolph Tan [ 13/Dec/18 ]

Note: blacklisted background_index_multikey.js here and will also add it to new sharding suites as well. Will tag the comment with this ticket number when adding the new blacklists.

Comment by Randolph Tan [ 12/Dec/18 ]

example failure:
https://evergreen.mongodb.com/task/mongodb_mongo_master_ubuntu1604_debug_asan_multi_shard_multi_stmt_txn_jscore_passthrough_patch_cee9c4deed8bbf0c612b465be4625d5d0775d204_5c101aab2fbabe50f96caca3_18_12_11_20_15_03##%257B%2522compare%2522%253A%255B%257B%2522hash%2522%253A%2522cee9c4deed8bbf0c612b465be4625d5d0775d204%2522%257D%255D%257D

louis.williams took a first look and suspects that there might be a race in replicating the background index build and replicating the documents.

Generated at Thu Feb 08 04:49:23 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.