[SERVER-67385] Range deletion tasks may be wrongly scheduled before ongoing queries on range finish on a shard primary Created: 20/Jun/22 Updated: 29/Oct/23 Resolved: 12/Aug/22 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | 5.0.14, 6.0.2, 6.1.0-rc0 |
| Type: | Bug | Priority: | Critical - P2 |
| Reporter: | Pierlauro Sciarelli | Assignee: | Pierlauro Sciarelli |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||||||||||||||
| Backport Requested: |
v6.0, v5.0
|
||||||||||||||||||||||||||||||||
| Sprint: | Sharding EMEA 2022-08-22 | ||||||||||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||||||||||
| Description |
|
The CollectionShardingRuntime for a sharded collection is keeping a reference to a MetadataManager that is responsible to keep track of a list of open cursors to know how many queries are running using different filtering metadata for the collection at different points in time. On a shard primary node, such list is iterated when having to schedule a range deletion task in order to determine whether it is needed to wait for running queries or the task can be "safely" scheduled after orphanCleanupDelaySecs because there are no queries acting on the orphan range. However, when filtering metadata are cleared up, the CollectionShardingRuntime is loosing track of previous metadata managers. This means that the range-deleter may not honor the promise that all running queries on the shard primary have been completed before starting deleting documents from an orphaned range. It follows that the following claim from the documentation has always been incorrect: "Before deleting the chunk during chunk migration, MongoDB waits for orphanCleanupDelaySecs or for in-progress queries involving the chunk to complete on the shard primary, whichever is longer". This bug can be traced back to v5.0. |
| Comments |
| Comment by Githook User [ 03/Oct/22 ] |
|
Author: {'name': 'Pierlauro Sciarelli', 'email': 'pierlauro.sciarelli@mongodb.com', 'username': 'pierlauro'}Message: (cherry picked from commit 32c2f632eaa7bf80607880162ec5e4eaeb22d7fe) |
| Comment by Githook User [ 18/Aug/22 ] |
|
Author: {'name': 'Pierlauro Sciarelli', 'email': 'pierlauro.sciarelli@mongodb.com', 'username': 'pierlauro'}Message: (cherry picked from commit 32c2f632eaa7bf80607880162ec5e4eaeb22d7fe) |
| Comment by Garaudy Etienne [ 27/Jul/22 ] |
|
I thought I had filed it! lol. It's here now. marked the docs ticket as related. DOCSP-24024 |
| Comment by Garaudy Etienne [ 12/Jul/22 ] |
|
garaudy.etienne@mongodb.com to remove the "or for in-progress queries involving the chunk to complete on the shard primary, whichever is longer" part of the docs. |