[SERVER-60774] Resharding may apply through reshardFinalOp without transitioning to strict consistency, stalling write operations on collection being resharded until critical section times out Created: 18/Oct/21  Updated: 29/Oct/23  Resolved: 21/Oct/21

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 5.1.0-rc0
Fix Version/s: 5.2.0, 5.0.4, 5.1.0-rc2

Type: Bug Priority: Major - P3
Reporter: Max Hirschhorn Assignee: Max Hirschhorn
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
is depended on by SERVER-57686 We need test coverage that runs resha... Closed
Problem/Incident
is caused by SERVER-49897 Insert no-op entries into oplog buffe... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v5.1, v5.0
Sprint: Sharding 2021-11-01
Participants:
Story Points: 1

 Description   

Typically the ReshardingOplogFetcher stops fetching new donor oplog entries after it has retrieved the reshardFinalOp entry. However, upon resuming from a primary failover, the ReshardingOplogFetcher won't realize it has fetched the reshardFinalOp entry already and it'll continue to retrieve empty batches. (The donor shard won't write any new oplog entries destined for the recipient shard after the reshardFinalOp.) This leads the ReshardingOplogFetcher to insert no-op reshardProgressMark entries after the reshardFinalOp intos the recipient shard's local oplog buffer collection.

The ReshardingDonorOplogIterator assumes the reshardFinalOp will be the last entry in a batch. The presence of these extra no-op reshardProgressMark entries after the reshardFinalOp will prevent it and the ReshardingOplogApplier from realizing the reshardFinalOp has been applied through. The recipient shard therefore never reaches the "strict-consistency" state. This leads the overall resharding operation to fail with a ReshardingCriticalSectionTimeout error response after writes been blocked for the collection being resharded for reshardingCriticalSectionTimeoutMillis (5 seconds by default).

if (!batch.empty()) {
    const auto& lastEntryInBatch = batch.back();
    _resumeToken = getId(lastEntryInBatch);
 
    if (isFinalOplog(lastEntryInBatch)) {
        _hasSeenFinalOplogEntry = true;
        // Skip returning the final oplog entry because it is known to be a no-op.
        batch.pop_back();



 Comments   
Comment by Githook User [ 21/Oct/21 ]

Author:

{'name': 'Max Hirschhorn', 'email': 'max.hirschhorn@mongodb.com', 'username': 'visemet'}

Message: SERVER-60774 Early exit in ReshardingOplogFetcher if final op fetched.

Changes the ReshardingOplogFetcher to return without doing any work when
the reshardFinalOp entry had already been fetched prior to resuming.

Also changes ReshardingDonorOplogIterator to throw if it ever sees an
entry in the local resharding oplog buffer collection after the
reshardFinalOp entry.

(cherry picked from commit 13093cdb3f878f20e8ebda8ac78f329d1b33a52f)
Branch: v5.1
https://github.com/mongodb/mongo/commit/755a758c98e657e590d862057479baeb4f5a1e1c

Comment by Githook User [ 21/Oct/21 ]

Author:

{'name': 'Max Hirschhorn', 'email': 'max.hirschhorn@mongodb.com', 'username': 'visemet'}

Message: SERVER-60774 Early exit in ReshardingOplogFetcher if final op fetched.

Changes the ReshardingOplogFetcher to return without doing any work when
the reshardFinalOp entry had already been fetched prior to resuming.

Also changes ReshardingDonorOplogIterator to throw if it ever sees an
entry in the local resharding oplog buffer collection after the
reshardFinalOp entry.

(cherry picked from commit 13093cdb3f878f20e8ebda8ac78f329d1b33a52f)
Branch: v5.0
https://github.com/mongodb/mongo/commit/f8558ce2e556405fed0a22528fe67de6dfcd09dd

Comment by Githook User [ 20/Oct/21 ]

Author:

{'name': 'Max Hirschhorn', 'email': 'max.hirschhorn@mongodb.com', 'username': 'visemet'}

Message: SERVER-60774 Early exit in ReshardingOplogFetcher if final op fetched.

Changes the ReshardingOplogFetcher to return without doing any work when
the reshardFinalOp entry had already been fetched prior to resuming.

Also changes ReshardingDonorOplogIterator to throw if it ever sees an
entry in the local resharding oplog buffer collection after the
reshardFinalOp entry.
Branch: master
https://github.com/mongodb/mongo/commit/13093cdb3f878f20e8ebda8ac78f329d1b33a52f

Comment by Max Hirschhorn [ 20/Oct/21 ]

Reopened because my changes above don't handle how after the ReshardingOplogFetcher has inserted the reshardFinalOp, upon resuming, it won't ever know to exit. I'll type up the changes to switch to have getOplogFetcherResumeId() detect whether the reshardFinalOp has already been inserted so the ReshardingOplogFetcher can know there's no work left for it to do.

Comment by Githook User [ 19/Oct/21 ]

Author:

{'name': 'Max Hirschhorn', 'email': 'max.hirschhorn@mongodb.com', 'username': 'visemet'}

Message: SERVER-60774 Check every op in ReshardingDonorOplogIterator for final.

Changes the ReshardingDonorOplogIterator to not assume the
reshardFinalOp entry will always be the last document in resharding's
oplog buffer collection. The ReshardingOplogFetcher may end up inserting
no-op reshardProgressMark entries after the reshardFinalOp entry upon
resuming from a primary failover.

(cherry picked from commit d3b36587bc2222fdefa1c755bb253f78c894496c)
Branch: v5.1
https://github.com/mongodb/mongo/commit/b9bd6cbda4b93de67cc20fc2df170e815a59ba22

Comment by Githook User [ 19/Oct/21 ]

Author:

{'name': 'Max Hirschhorn', 'email': 'max.hirschhorn@mongodb.com', 'username': 'visemet'}

Message: SERVER-60774 Check every op in ReshardingDonorOplogIterator for final.

Changes the ReshardingDonorOplogIterator to not assume the
reshardFinalOp entry will always be the last document in resharding's
oplog buffer collection. The ReshardingOplogFetcher may end up inserting
no-op reshardProgressMark entries after the reshardFinalOp entry upon
resuming from a primary failover.

(cherry picked from commit d3b36587bc2222fdefa1c755bb253f78c894496c)
Branch: v5.0
https://github.com/mongodb/mongo/commit/00fa692f6a891cd2c370dcd59aa7423327023fd9

Comment by Githook User [ 18/Oct/21 ]

Author:

{'name': 'Max Hirschhorn', 'email': 'max.hirschhorn@mongodb.com', 'username': 'visemet'}

Message: SERVER-60774 Check every op in ReshardingDonorOplogIterator for final.

Changes the ReshardingDonorOplogIterator to not assume the
reshardFinalOp entry will always be the last document in resharding's
oplog buffer collection. The ReshardingOplogFetcher may end up inserting
no-op reshardProgressMark entries after the reshardFinalOp entry upon
resuming from a primary failover.
Branch: master
https://github.com/mongodb/mongo/commit/d3b36587bc2222fdefa1c755bb253f78c894496c

Generated at Thu Feb 08 05:50:42 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.