[SERVER-31485] Race between move chunks and dropIndex may lead to IndexNotFound error Created: 10/Oct/17  Updated: 30/Oct/23  Resolved: 03/Nov/17

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 3.5.13
Fix Version/s: 3.6.0-rc3

Type: Bug Priority: Major - P3
Reporter: Eddie Louie Assignee: Esha Maharishi (Inactive)
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Related
related to SERVER-31732 Recipient shards of migrations hold D... Closed
is related to SERVER-31715 createIndexes (and dropIndexes) may n... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Steps To Reproduce:

Use the following patch:

diff --git a/src/mongo/db/s/migration_destination_manager.cpp b/src/mongo/db/s/migration_destination_manager.cpp
index 0dd4967..14944f0 100644
--- a/src/mongo/db/s/migration_destination_manager.cpp
+++ b/src/mongo/db/s/migration_destination_manager.cpp
@@ -207,6 +207,7 @@ MONGO_FP_DECLARE(migrateThreadHangAtStep3);
 MONGO_FP_DECLARE(migrateThreadHangAtStep4);
 MONGO_FP_DECLARE(migrateThreadHangAtStep5);
 MONGO_FP_DECLARE(migrateThreadHangAtStep6);
+MONGO_FP_DECLARE(moveChunkDropIndex);
 
 MONGO_FP_DECLARE(failMigrationReceivedOutOfRangeOperation);
 
@@ -589,6 +590,8 @@ void MigrationDestinationManager::_migrateDriver(OperationContext* opCtx,
         }
     }
 
+    MONGO_FAIL_POINT_PAUSE_WHILE_SET(moveChunkDropIndex);
+
     {
         // 1. copy indexes

Then run the following file using: ./mongo --nodb repro_bf6752.js

repro_bf6752.js

/**
 * Test race condition between moveChunk and dropIndex in a sharded cluster.
 */
(function() {
    "use strict";
 
    load('jstests/libs/parallelTester.js');
 
    // moveChunk will initiate collection creation on secondary shard.
    function moveChunk_dropIndex(host) {
 
    	const conn = new Mongo(host);
    	const db = conn.getDB("test");
 
        assert.commandWorked(db.adminCommand({
            moveChunk : "test.mycoll",
            find : {moveChunk_dropIndex_field: 1},
            to : "moveChunkDropIndex-rs1"}));
 
    }
 
    const st = new ShardingTest(
    	{name: "moveChunkDropIndex", mongos: 1, config: 1, shards: 2, rs: {nodes: 1}});
    const db = st.s.getDB("test");
    const coll = db.getCollection("mycoll");
    const mongos = st.s0;
    const shard1DB = st.shard1.getDB("test");
 
    assert.commandWorked(mongos.adminCommand({enableSharding: "test"}));
    assert.commandWorked(
    	db.adminCommand({shardCollection: "test.mycoll", key: {moveChunk_dropIndex_field: 1}}));
 
    assert.writeOK(coll.insert({moveChunk_dropIndex_field: 1}));
 
    // Enable fail point on secondary shard. This will stop execution of moveChunk right after
    // collection is created with options, but before indexes are created.
    assert.commandWorked(
    	shard1DB.adminCommand({"configureFailPoint" : "moveChunkDropIndex", "mode" : 'alwaysOn'}));
    
    // Create separate thread to run 'dropIndex' command.
    const dropIndexThread =
    	new ScopedThread(moveChunk_dropIndex, mongos.host);
    dropIndexThread.start();
 
    // This is used as a sync point. Wait until collection is created on secondary shard.
    assert.soon(function() {
    	return shard1DB.getCollectionInfos({name: "mycoll"}).length === 1;
    });
 
    // Drop index command should fail with IndexNotFound.
    assert.commandWorked(coll.dropIndex({moveChunk_dropIndex_field: 1}));
 
    // Disable failpoint and wait for dropIndexThread to exit.
    assert.commandWorked(
    	shard1DB.adminCommand({configureFailPoint : "moveChunkDropIndex", "mode" : "off"}));
 
    dropIndexThread.join();
    st.stop();
})();

Sprint: Sharding 2017-10-23, Sharding 2017-11-13
Participants:
Linked BF Score: 0

 Description   

A race condition between the migration of chunks to the secondary shard (and index creation) and the drop index. With help from Max Hirschhorn we theorize the following scenario.
1. client runs build index. Mongos broadcast to all shards.
2. On shard 1, build index completes.
3. On shard 2, with no data present for collection, no index is created nor implicit collection creation.
4. At a later time, a move chunk is initiated from shard 1 to shard 2.
5. At shard 2, chunk is migrated and hence collection exists, but before indexes are created.
6. Drop index is broadcast to both shards. Completes successful on shard 1. But on shard 2 returns IndexNotFound.
7. Indexes are created on collection on shard 2.
A thought on possible solution.
1. Instead of returning IndexNotFound, return another error code that would allow the mongos to trigger a retry in this scenario.
2. Have dropIndexes block until collection "cloning" completes on secondary shard.
Backlog - Sharding Team I'm going to pass this on to you guys to have a look at the possible solutions.



 Comments   
Comment by Githook User [ 03/Nov/17 ]

Author:

{'name': 'Esha Maharishi', 'username': 'EshaMaharishi', 'email': 'esha.maharishi@mongodb.com'}

Message: SERVER-31485 Race between move chunks and dropIndex may lead to IndexNotFound error
Branch: master
https://github.com/mongodb/mongo/commit/d5a458a78f3da0daed6dd1ee2dc39b274a2849e8

Comment by Esha Maharishi (Inactive) [ 25/Oct/17 ]

max.hirschhorn and I identified two bugs across dropIndexes and createIndexes, summarized in SERVER-31715.

However, there's an additional bug within the dropIndexes bug: it's possible for dropIndexes to return ok:0 with IndexNotFound instead of ok:1 with NamespaceNotFound, because a recipient shard drops its exclusive database lock in between creating the collection and creating the indexes.

This ticket will make recipient shards hold the database lock across creating the collection and creating the indexes to fix this smaller issue within dropIndexes only.

Comment by Esha Maharishi (Inactive) [ 23/Oct/17 ]

Update: This can manifest in a more general way that cannot be fixed with local synchronization between createIndexes and receiving a chunk on a recipient shard (and/or local synchronization between createIndexes and donating a chunk on a donor shard):

  • recipient starts and completes createIndexes, does not create the index
  • recipient starts and completes a migration
  • donor starts and completes createIndexes

One solution could be to make createIndexes (and collMod) take a distlock, since migrations occur under a distlock. The drawback is that the Balancer schedules migrations frequently (and holds a collection distlock for the duration of each), possibly frequently enough to starve createIndexes from ever grabbing the distlock.

Comment by Esha Maharishi (Inactive) [ 10/Oct/17 ]

This is the relevant code on the recipient shard, where it copies indexes from the donor shard and then creates the collection entry on itself: https://github.com/mongodb/mongo/blob/r3.5.13/src/mongo/db/s/migration_destination_manager.cpp#L485-L646

Comment by Esha Maharishi (Inactive) [ 10/Oct/17 ]

Hmm, I see the race now. I think it's slightly different than the one described above - it seems like it's:

1) migration initiated
2) recipient shard asks donor shard for indexes, finds none
3) recipient shard receives createIndexes, but hasn't created the collection entry on itself yet, so doesn't create the index
4) recipient shard creates the collection entry on itself
5) recipient shard receives dropIndex but doesn't have the index, so returns IndexNotFound

A fix could be to make a recipient shard take a collection lock between asking the donor shard for indexes and creating the collection entry on itself.

Thanks very much to eddie.louie and max.hirschhorn for their work on this hard-to-diagnose issue.

Generated at Thu Feb 08 04:27:13 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.