[SERVER-30183] a moveChunk that joins the active moveChunk on a shard may not respect its waitForDelete Created: 17/Jul/17  Updated: 30/Oct/23  Resolved: 15/Aug/17

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 3.4.6, 3.5.10
Fix Version/s: 3.4.9, 3.5.12

Type: Bug Priority: Major - P3
Reporter: Esha Maharishi (Inactive) Assignee: Esha Maharishi (Inactive)
Resolution: Fixed Votes: 0
Labels: todo_in_code
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Related
related to SERVER-43441 Remove TODO listed in SERVER-30183 Closed
is related to SERVER-29834 only the active moveChunk on a shard ... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v3.4
Sprint: Sharding 2017-08-21
Participants:

 Description   

moveChunk's are considered equal even if their 'waitForDelete' options do not match:

https://github.com/mongodb/mongo/blob/r3.5.10/src/mongo/s/move_chunk_request.cpp#L159-L170

bool MoveChunkRequest::operator==(const MoveChunkRequest& other) const {
    if (_nss != other._nss)
        return false;
    if (_fromShardId != other._fromShardId)
        return false;
    if (_toShardId != other._toShardId)
        return false;
    if (_range != other._range)
        return false;
 
    return true;
}

However, only the active moveChunk acts on its 'waitForDelete' option.

So, if the active moveChunk has waitForDelete=false, a moveChunk that joins it will exhibit waitForDelete=false behavior even if it has waitForDelete=true.

A quick fix for this is to include the 'waitForDelete' option when comparing moveChunk requests. This way, if a later moveChunk's waitForDelete does not match the active moveChunk's waitForDelete, the later moveChunk will fail with ConflictingOperationInProgress rather than succeeding silently.

A longer fix is to refactor the schedule range deletion and waitForDelete behavior so that it is done by each moveChunk request according to its own waitForDelete option. This would allow a later moveChunk to join the active one for the actual migration, but wait for delete independently.



 Comments   
Comment by Githook User [ 26/May/20 ]

Author:

{'name': 'Esha Maharishi', 'email': 'esha.maharishi@mongodb.com', 'username': 'EshaMaharishi'}

Message: SERVER-43441 Remove TODO listed in SERVER-30183
Branch: master
https://github.com/mongodb/mongo/commit/b4094a6541bf5745cb225639c2486fcf390c4c38

Comment by Githook User [ 15/Aug/17 ]

Author:

{'username': 'EshaMaharishi', 'email': 'esha.maharishi@mongodb.com', 'name': 'Esha Maharishi'}

Message: SERVER-30183 ensure a moveChunk that joins the active moveChunk has the same waitForDelete option

(cherry picked from commit 0578fe2bf94f66e6f5ddb9954c442b77a10bc202)
Branch: v3.4
https://github.com/mongodb/mongo/commit/7cbeedf66f4bde5b25e14665cc3537ddad28a122

Comment by Githook User [ 15/Aug/17 ]

Author:

{'username': 'EshaMaharishi', 'email': 'esha.maharishi@mongodb.com', 'name': 'Esha Maharishi'}

Message: SERVER-30183 ensure a moveChunk that joins the active moveChunk has the same waitForDelete option
Branch: master
https://github.com/mongodb/mongo/commit/ac278086e705b289a784e4f40fe7b851b69a7b57

Generated at Thu Feb 08 04:22:57 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.