[SERVER-33262] Avoid node shutdown on rollback of ShardIdentity document if possible Created: 12/Feb/18  Updated: 27/Oct/23  Resolved: 01/Jun/18

Status: Closed
Project: Core Server
Component/s: Replication, Sharding
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Major - P3
Reporter: William Schultz (Inactive) Assignee: Esha Maharishi (Inactive)
Resolution: Works as Designed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Sprint: Sharding 2018-05-07, Sharding 2018-06-04
Participants:

 Description   

If possible, it would be an improvement to eliminate the need to crash the entire server if the shard identity document is rolled back, if the appropriate in-memory state can be refreshed or invalidated. This may require a bit of investigation/discussion.



 Comments   
Comment by Esha Maharishi (Inactive) [ 01/Jun/18 ]

I am going to close this as Works as Designed. The "rollback to timestamp" team already implemented the fassert on shardIdentity rollback behavior, and there isn't really a clear winner between fasserting immediately (has temporal locality - easier to understand why it fasserted) and fasserting later if the shard is added with a different shardName (avoids fasserting if this never occurs).

Comment by Andy Schwerin [ 13/Mar/18 ]

OK.

Comment by Esha Maharishi (Inactive) [ 12/Mar/18 ]

schwerin see 1) and 2) in my comment above - the shard will fassert at that point (only if the parameters are different), which I think is fine? As opposed to fasserting immediately on rolling back the first shardIdentity doc.

Comment by Andy Schwerin [ 12/Mar/18 ]

If we don't crash when rolling back the shard identity document, and the shard is later re-added with different parameters (e.g., different shard name), will everything be cool?

Comment by Esha Maharishi (Inactive) [ 09/Mar/18 ]

True, this node will continue to do things like periodically refresh its ShardRegistry from the config server, even though it is not considered a shard of the cluster. This is an issue we also have after removeShard, though.

Comment by Kaloian Manassiev [ 09/Mar/18 ]

To add to that, I think the ultimate potential support problem here is that if addShard is not retried, then this replica set remains in this half-initialized sharding mode (some nodes are in-memory initialized and half aren't) and since we have a way of catching it, I also think it is fine to remove this fassert.

Comment by Esha Maharishi (Inactive) [ 09/Mar/18 ]

This is the relevant code that crashes the server in rs_rollback.cpp:

    // If we detected that we rolled back the shardIdentity document as part of this rollback
    // then we must shut down to clear the in-memory ShardingState associated with the
    // shardIdentity document.
    if (ShardIdentityRollbackNotifier::get(opCtx)->didRollbackHappen()) {
        severe() << "shardIdentity document rollback detected.  Shutting down to clear "
                    "in-memory sharding state.  Restarting this process should safely return it "
                    "to a healthy state";
        fassertFailedNoTrace(40498);
    }

And SERVER-23802 is the initial ticket that introduced the ShardIdentityRollbackNotifier.

I believe we can just remove this rollback notifier class, and not crash the server (which is done in order to clear the in-memory sharding state) if the shardIdentity is rolled back:

The shardIdentity is inserted by the config server onto the new shard with majority writeConcern. The config server will check the writeConcern status and if it failed, the config server will fail the addShard.

So, the shard will not be officially added if the shardIdentity doc rolls back. If addShard is retried by the client, the shardIdentity will be attempted to be inserted again. At that point:

1) If the same node became primary and the shardIdentity is re-inserted on it, but the shardName, configConnectionString, or clusterId are different, the node will fassert.
2) If a different node became primary and the shardIdentity is re-inserted on it, the old primary will fassert for the same reason on replicating the new shardIdentity doc.

Note that the initializeFromShardIdentity() function is called in an OpObserver on inserts to "admin.system.version" with _id: "shardIdentity".

spencer, schwerin, does it sound fine to you to remove the "crash on rolling back shardIdentity" behavior?

Comment by William Schultz (Inactive) [ 12/Feb/18 ]

kaloian.manassiev

Generated at Thu Feb 08 04:32:50 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.