[SERVER-62178] Resharding can fail with NamespaceNotSharded if recipient primary fails over before creating temporary resharding collection Created: 17/Dec/21  Updated: 29/Oct/23  Resolved: 13/Jan/22

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 5.0.0, 5.1.0, 5.2.0-rc1
Fix Version/s: 5.3.0, 5.0.6, 5.2.1

Type: Bug Priority: Major - P3
Reporter: Max Hirschhorn Assignee: Matt Boros
Resolution: Fixed Votes: 0
Labels: sharding-nyc-subteam1
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Related
is related to SERVER-59023 Resharding can fail with NamespaceNot... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v5.2, v5.1, v5.0
Sprint: Sharding 2022-01-10, Sharding 2022-01-24
Participants:
Linked BF Score: 30
Story Points: 3

 Description   

ShardServerOpObserver::onCreateCollection() sets the filter metadata for the collection as unsharded if the collection isn't already known to be sharded. This behavior isn't something which can be changed without significant work from Sharding EMEA, although it would probably be better to leave the filtering metadata as unknown.

The following scenario can therefore lead the recipient shard primary to wrongly believe the temporary resharding collection is unsharded and for it to throw a NamespaceNotSharded exception from assertCanExtractShardKeyFromDocs().

  1. Recipient shard primary in term=1 is told by coordinator to refresh on the temporary resharding collection. At this point the recipient shard primary knows the temporary resharding collection is sharded.
  2. Recipient shard primary in term=1 constructs a RecipientStateMachine and waits for all donor shards to be prepared to donate.
  3. Recipient shard primary in term=1 is told by coordinator all donor shards are prepared to donate.
  4. Recipient shard primary in term=1 is transitions from RecipientStateEnum::kAwaitingFetchTimestamp to RecipientStateEnum::kCreatingCollection.
  5. An election occurs and there's a new recipient shard primary in term=2.
  6. Recipient shard primary in term=2 clears the filtering metadata for the temporary resharding collection on step-up.
  7. Recipient shard primary in term=2 creates the temporary resharding collection. ShardServerOpObserver::onCreateCollection() causes the recipient shard primary to set the filtering metadata for the temporary resharding collection as unsharded.

This happens because the recipient shard primary in term=2 resumes without being guaranteed it has itself refreshed from the config server and set the filter metadata for the temporary resharding collection as sharded. One solution is to have the recipient shard primary clear the filtering metadata for the temporary resharding collection after creating it so that resharding's collection cloning and resharding's oplog application can simply refresh to set it as sharded.



 Comments   
Comment by Githook User [ 20/Jan/22 ]

Author:

{'name': 'Matt Boros', 'email': 'matt.boros@mongodb.com'}

Message: SERVER-62178 Clear filtering metadata on recipient shard primary for temporary resharding collection after creation

(cherry picked from commit e9d49578b3e11044e0e41bdcbb72f41cd17e571c)
Branch: v5.2
https://github.com/mongodb/mongo/commit/acef8a36dfb37a1ed2d3db9760724557d2caa794

Comment by Githook User [ 18/Jan/22 ]

Author:

{'name': 'Matt Boros', 'email': 'matt.boros@mongodb.com'}

Message: SERVER-62178 Clear filtering metadata on recipient shard primary for temporary resharding collection after creation

(cherry picked from commit e9d49578b3e11044e0e41bdcbb72f41cd17e571c)
Branch: v5.0
https://github.com/mongodb/mongo/commit/a495a8881f30e97775668e5e49b404cf0623fa54

Comment by Githook User [ 13/Jan/22 ]

Author:

{'name': 'Matt Boros', 'email': 'matt.boros@mongodb.com'}

Message: SERVER-62178 Clear filtering metadata on recipient shard primary for temporary resharding collection after creation
Branch: master
https://github.com/mongodb/mongo/commit/e9d49578b3e11044e0e41bdcbb72f41cd17e571c

Generated at Thu Feb 08 05:54:23 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.