[SERVER-83564] Make sure the process field is indexed in config.locks Created: 24/Nov/23 Updated: 07/Feb/24 Resolved: 11/Jan/24 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | 6.0.11 |
| Fix Version/s: | 5.0.25, 4.4.29, 6.0.14 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Dmitry Ryabtsev | Assignee: | Sulabh Mahajan |
| Resolution: | Fixed | Votes: | 1 |
| Labels: | car-investigation, sharding-emea-pm-iteration-planning | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Assigned Teams: |
Catalog and Routing
|
||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||
| Operating System: | ALL | ||||||||
| Backport Requested: |
v5.0, v4.4
|
||||||||
| Sprint: | CAR Team 2023-12-11, CAR Team 2023-12-25, CAR Team 2024-01-08, CAR Team 2024-01-22 | ||||||||
| Participants: | |||||||||
| Case: | (copied to CRM) | ||||||||
| Description |
|
It has been observed that in a case with a large number of collections being created / dropped certain update operations on the locks collection may never be able to complete if the CSRS does not have enough capacity to execute a COLLSCAN update in under 30 seconds:
Unless there is a work planned to ensure the locks collection does not get inflated, the solution is to make sure there is an index on { "process": 1 }. |
| Comments |
| Comment by Githook User [ 07/Feb/24 ] |
|
Author: {'name': 'Sulabh Mahajan', 'email': 'sulabh.mahajan@mongodb.com', 'username': 'sulabhM'}Message: GitOrigin-RevId: 9201e694024b4c461b990c777a7da25d1468a683 |
| Comment by Githook User [ 30/Jan/24 ] |
|
Author: {'name': 'Sulabh Mahajan', 'email': 'sulabh.mahajan@mongodb.com', 'username': 'sulabhM'}Message: GitOrigin-RevId: 9da07653331f02bec726dd9343ef39c2c52acb19 |
| Comment by Githook User [ 11/Jan/24 ] |
|
Author: {'name': 'Sulabh Mahajan', 'email': 'sulabh.mahajan@mongodb.com', 'username': 'sulabhM'}Message: GitOrigin-RevId: 716488c24624812685c255533516d12f252725d2 |
| Comment by Sulabh Mahajan [ 21/Dec/23 ] |
|
Thinking out loud here. Reading the code suggests that whenever an election happens, and a new primary gets selected, we call ReplicationCoordinatorExternalState::onTransitionToPrimary(). Its implementation seems to call ReplicationCoordinatorExternalStateImpl::_shardingOnTransitionToPrimaryHook(), which checks if this node is a part of the config server, and initializes the config database if needed. The relevant collections and indexes are created if required. The function for the index creation is ShardingCatalogManager::_initConfigIndexes(). I can change that function to ensure an index gets created on the process field. createIndexOnConfigCollection() logs a message if the index is being created on a non-empty collection, as usually a config database's collection and indexes would be created before writes to it. So, when upgrading, as a node in the config server replica set becomes the primary, there will be a one-time log message about the index creation. It seems okay to me. |
| Comment by Haley Connelly [ 05/Dec/23 ] |
|
We agree that the patch should be to add an index on the "process field". As pierlauro.sciarelli@mongodb.com mentioned, the issue is only pre-6.1: we should pursue the simplest patch to mitigate issues with updating "config.locks". Garbage collection would involve significant changes, so we should instead add an index to prevent the need for full table scans. Context If, in the future, customers start seeing excessive growth in config.locks, where the collection data grows to an unreasonable size, we should revisit the idea of creating a garbage collection script. |
| Comment by Pierlauro Sciarelli [ 24/Nov/23 ] |
|
[note] This problem only affects pre-v6.1.0 versions since the distributed lock has gone away under |