[SERVER-47025] moveChunk after refine shard key can hang indefinitely due to missing shard key index Created: 20/Mar/20  Updated: 29/Oct/23  Resolved: 04/Jul/23

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: 6.2.0-rc0

Type: Bug Priority: Major - P3
Reporter: Matthew Saltz (Inactive) Assignee: [DO NOT USE] Backlog - Sharding EMEA
Resolution: Fixed Votes: 0
Labels: PM-2144-Milestone-0
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File range_deleter_refine_missing_index_repro.js    
Issue Links:
Backports
Depends
depends on SERVER-69768 Include key pattern in range deletion... Closed
Related
related to SERVER-79632 Stop range deletion when hashed shard... Closed
is related to SERVER-52906 moveChunk after failed migration that... Closed
Assigned Teams:
Sharding EMEA
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v4.4
Sprint: Sharding 2020-04-06, Sharding 2020-04-20, Sharding 2020-05-04, Sharding 2020-05-18, Sharding 2020-07-13, Sharding 2020-06-01, Sharding 2020-06-15, Sharding 2020-06-29, Sharding 2020-07-27, Sharding 2020-08-24
Participants:

 Description   

When the resumable range deleter is disabled, the recipient of a chunk starts by removing potentially orphaned documents. After that, it clones necessary indexes from the donor.

However, the range deleter relies on the shard key index in order to perform deletions.

This can lead to the following scenario:
1. A moveChunk begins
2. The shard key is refined
3. The moveChunk fails on the recipient for some reason, causing the entire moveChunk to fail
4. The moveChunk is restarted, now with a refined shard key
5. The recipient of the moveChunk attempts to delete the incoming range using the range deleter with the refined shard key
6. The range deleter loops infinitely because it is unable to find a shard key index.

There may be less convoluted scenarios that could cause this as well but I'm having trouble thinking of one.

Repro attached.



 Comments   
Comment by Jordi Serra Torrens [ 04/Jul/23 ]

SERVER-69768 fixed this bug by persisting the shardKeyPattern on the range deletion task document and later using it to find a suitable index for executing the range deletion. This way, the range deletion task for the migration that started before refineCollectionShardKey would have the pre-refine shardKeyPattern. After refine, the range deleter is still able to find the pre-refine index on the recipient shard.

Comment by Esha Maharishi (Inactive) [ 19/Nov/20 ]

Bringing this back into Needs Scheduling - it had been on my todo list but never ended up getting finished.

I had discussed with Andy that the range deleter should fall back to a collection scan if there is no shard key index, and with Charlie that the range deleter should use a higher-level interface into the query system than deleteWithIndexScan.

I tested making the range deleter use getExecutorDelete with a range query, allowing the query system to choose an index if available, but it didn't work if the shard key was hashed.

The issue was, the range to delete is stored in terms of the hashed shard key, and I was trying to create a query with $gte and $lt those hashed values. So the query was comparing the hashed values to the actual values and returning nonsense.

Andy mentioned the query language should have a $hash operator that applies a hash to the actual values, so that a later pipeline stage can compare two hashed values.

A $toHashedIndexKey operator was actually implemented this past summer, see the syntax doc and SERVER-49214 that added it.

This may help in falling back to a collection scan, though since pipeline-style removes are not currently supported, it would require using an agg to find the _id's of documents to delete, then a delete to delete them.

Comment by Esha Maharishi (Inactive) [ 17/Nov/20 ]

I filed a separate ticket (SERVER-52906) for the bug Blake mentioned and re-linked BF-17537 to that ticket.

Comment by Blake Oler [ 11/Jun/20 ]

Linking BF-17537 to this ticket – a similar scenario lands us in the same infinite loop.

  1. A recovery doc is persisted for a migration.
  2. A migration aborts after cloning indexes, but before majority committing that index creation.
  3. On step up, the index creation gets rolled back.
  4. The stepped-up node attempts to delete the ranges from the aborted migration.
  5. The deletion infinitely fails because the shard key index doesn't exist on the stepped-up node.
Comment by Esha Maharishi (Inactive) [ 12/May/20 ]

schwerin probably not; I'm moving it 4.4 Required.

Generated at Thu Feb 08 05:13:04 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.