[SERVER-74841] collMod should not call catalogClient::getCollection during secondary replication Created: 14/Mar/23 Updated: 29/Oct/23 Resolved: 10/Apr/23 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | 7.0.0-rc0 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Randolph Tan | Assignee: | Allison Easton |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | sharding-wfbf-day | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||
| Assigned Teams: |
Sharding EMEA
|
||||||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||||||
| Sprint: | Sharding EMEA 2023-04-03, Sharding EMEA 2023-04-17 | ||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||
| Linked BF Score: | 135 | ||||||||||||||||||||||||
| Description |
|
This is problematic in a config catalog environment because CatalogClient::getCollection will perform a read with read concern majority with afterClusterTime against the config server. And since itself is the config server, it can get stuck waiting for the clusterTime since the node's opTime will not advance because it is still processing the collMod op, causing a cyclic dependency. It is also not ideal that this call is performed while collection MODE_X is being held. |
| Comments |
| Comment by Githook User [ 10/Apr/23 ] |
|
Author: {'name': 'Allison Easton', 'email': 'allison.easton@mongodb.com', 'username': 'allisoneaston'}Message: |
| Comment by Jack Mulrow [ 22/Mar/23 ] |
|
Sounds good to me. I just wanted to note the option, but yeah if this doesn't only run on config servers then changing to local isn't even possible and wouldn't address those other issues. |
| Comment by Randolph Tan [ 22/Mar/23 ] |
|
Hm... I'm not sure changing to local is a good idea since this code can run on any shard. I also think that checks should be performed only on the primary to minimize the chances of them observing different things and arriving to a different conclusion. |
| Comment by Jack Mulrow [ 22/Mar/23 ] |
|
Just noting that by default, getting the catalogClient() from the Grid now gets a client with a ShardRemote for the config server, even on the config server (when the catalog shard feature flag is enabled). If this code always runs on the config server, we did add a way to still get a ShardLocal catalog client via ShardingCatalogManager::localCatalogClient(), so if the only problem here is that we're doing a network request, we can switch to using the ShardLocal catalog client instead. |
| Comment by Randolph Tan [ 14/Mar/23 ] |
|
The problematic calls: |