[SERVER-46184] sharded_moveChunk_drop_shard_key_index fails due to testing unsupported behavior Created: 14/Feb/20  Updated: 29/Oct/23  Resolved: 12/Mar/20

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: 4.4.0-rc0, 4.7.0

Type: Bug Priority: Major - P3
Reporter: Matthew Saltz (Inactive) Assignee: Randolph Tan
Resolution: Fixed Votes: 0
Labels: sharding-4.4-stabilization, sharding-wfbf-day
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v4.4
Sprint: Sharding 2020-03-23
Participants:
Linked BF Score: 30

 Description   

sharded_moveChunk_drop_shard_key_index drops the shard key index while performing moveChunk operations. Dropping the shard key index prevents range deletion from running, which means that incoming migrations hang waiting for orphaned documents to be deleted (but they can never be deleted). We should change this test to be more targeted to the thing it's supposed to test.



 Comments   
Comment by Githook User [ 13/Mar/20 ]

Author:

{'name': 'Randolph Tan', 'username': 'renctan', 'email': 'randolph@10gen.com'}

Message: SERVER-46184 Replace sharded_moveChunk_drop_shard_key_index.js with cpp test

(cherry picked from commit aeae7b5345b7c75b9e46a17d7eefaff59fb05de1)
Branch: v4.4
https://github.com/mongodb/mongo/commit/3d34131fb2a7e48ce9407c860a9da56cf637f139

Comment by Githook User [ 11/Mar/20 ]

Author:

{'username': 'renctan', 'name': 'Randolph Tan', 'email': 'randolph@10gen.com'}

Message: SERVER-46184 Replace sharded_moveChunk_drop_shard_key_index.js with cpp test
Branch: master
https://github.com/mongodb/mongo/commit/aeae7b5345b7c75b9e46a17d7eefaff59fb05de1

Comment by Randolph Tan [ 10/Mar/20 ]

More context:
The test actually attempts to recreate the index after dropping it. However, this does not perfectly restore the cluster in a working state.

One example where this wouldn't work is when dropIndex occurs before any migration starts. When the shardB receives it's first chunk, it will create the collection and copy the data without the shard key index. When the migration fails to commit, these data are effectively orphans. And since there's no shard key, the range deleter can't delete them. Attempting to call createIndex on mongos doesn't work since this shard doesn't own any chunk yet, so it won't receive the command. Attempting to "fix it" by sending create index straight to the shard is also racy, because it can get dropped by during migration. And if this happens before range deleter was able to cleanup the orphan, succeeding migrations will fail with CannotCreateCollection.

Comment by Matthew Saltz (Inactive) [ 14/Feb/20 ]

max.hirschhorn FYI. esha.maharishi mentioned you were looking at something related to this.

Generated at Thu Feb 08 05:10:43 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.