[SERVER-57280] ShardRegistry must be initialized before DDL coordinators contact any shard Created: 28/May/21  Updated: 29/Oct/23  Resolved: 08/Jun/21

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: 5.0.0-rc1, 5.1.0-rc0

Type: Bug Priority: Major - P3
Reporter: Pierlauro Sciarelli Assignee: Pierlauro Sciarelli
Resolution: Fixed Votes: 0
Labels: PM-1965-Milestone-1, sharding-wfbf-day
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Problem/Incident
causes SERVER-66658 Shard registry might be accessed befo... Closed
Related
related to SERVER-60916 CPS Restores failed with a snapshot w... Closed
related to SERVER-61003 ReadConcernMajorityNotAvailableYet er... Closed
is related to SERVER-50206 Remove "NoReload" ShardRegistry looku... Blocked
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v5.0
Sprint: Sharding EMEA 2021-06-14
Participants:
Linked BF Score: 23

 Description   

Requests sent by DDL coordinators to any shard (even itself) by using sendAuthenticatedCommandToShards are ending up in the ARS here where ShardRegistry::getShardNoReload is called, with no guarantee of retrieving updated info from the registry.

Objective of this ticket is to review all the usages of sendAuthenticatedCommandToShards in DDL coordinators in order to ensure that the shard registry is always initialized before any call.

In case of broadcasts, there is no problem because before ending up in the ARS there is always a call to getAllShardIds that internally triggers a reload if needed.

The problem is surely present in dropCollection and dropDatabase because the coordinator tries to contact the primary shard without any guarantee that the ShardRegistry is initialized.

Some possible solutions:

  • Move the getAllShardIds calls before contacting the primary shard.
  • Reload the shard registry on DDL coordinator construction (maybe just when resuming a DDL from disk?).


 Comments   
Comment by Vivian Ge (Inactive) [ 06/Oct/21 ]

Updating the fixversion since branching activities occurred yesterday. This ticket will be in rc0 when it’s been triggered. For more active release information, please keep an eye on #server-release. Thank you!

Comment by Githook User [ 08/Jun/21 ]

Author:

{'name': 'Pierlauro Sciarelli', 'email': 'pierlauro.sciarelli@mongodb.com', 'username': 'pierlauro'}

Message: SERVER-57280 ShardRegistry must be initialized before DDL coordinators contact any shard
Branch: v5.0
https://github.com/mongodb/mongo/commit/28e13ad4ee6ad9175537bb1f5ffde02e381a0bc7

Comment by Githook User [ 08/Jun/21 ]

Author:

{'name': 'Pierlauro Sciarelli', 'email': 'pierlauro.sciarelli@mongodb.com', 'username': 'pierlauro'}

Message: SERVER-57280 ShardRegistry must be initialized before DDL coordinators contact any shard
Branch: master
https://github.com/mongodb/mongo/commit/5bea2fccdf860e9119712d39b7d2da788fe63708

Comment by Tommaso Tocci [ 31/May/21 ]

The other possible alternative solution is to make the ARS use the causally consistent version of getShard instead of the deprecated getShardNoReload

Generated at Thu Feb 08 05:41:27 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.