[SERVER-67385] Range deletion tasks may be wrongly scheduled before ongoing queries on range finish on a shard primary Created: 20/Jun/22  Updated: 29/Oct/23  Resolved: 12/Aug/22

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 5.0.14, 6.0.2, 6.1.0-rc0

Type: Bug Priority: Critical - P2
Reporter: Pierlauro Sciarelli Assignee: Pierlauro Sciarelli
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
is depended on by SERVER-68660 Make range deleter service observer r... Closed
Problem/Incident
causes SERVER-69134 Dropping a sharded collection doesn't... Closed
Related
related to SERVER-67688 notifySecondariesThatDeletionIsOccurr... Closed
is related to SERVER-68352 Only wait for `orphanCleanupDelaySecs... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v6.0, v5.0
Sprint: Sharding EMEA 2022-08-22
Participants:

 Description   

The CollectionShardingRuntime for a sharded collection is keeping a reference to a MetadataManager that is responsible to keep track of a list of open cursors to know how many queries are running using different filtering metadata for the collection at different points in time.

On a shard primary node, such list is iterated when having to schedule a range deletion task in order to determine whether it is needed to wait for running queries or the task can be "safely" scheduled after orphanCleanupDelaySecs because there are no queries acting on the orphan range.

However, when filtering metadata are cleared up, the CollectionShardingRuntime is loosing track of previous metadata managers. This means that the range-deleter may not honor the promise that all running queries on the shard primary have been completed before starting deleting documents from an orphaned range.

It follows that the following claim from the documentation has always been incorrect: "Before deleting the chunk during chunk migration, MongoDB waits for orphanCleanupDelaySecs or for in-progress queries involving the chunk to complete on the shard primary, whichever is longer".

This bug can be traced back to v5.0.



 Comments   
Comment by Githook User [ 03/Oct/22 ]

Author:

{'name': 'Pierlauro Sciarelli', 'email': 'pierlauro.sciarelli@mongodb.com', 'username': 'pierlauro'}

Message: SERVER-67385 Range deletion tasks on primary must not be scheduled before ongoing queries finish

(cherry picked from commit 32c2f632eaa7bf80607880162ec5e4eaeb22d7fe)
Branch: v5.0
https://github.com/mongodb/mongo/commit/834bbf20f9af79970d018594bb50ce9d98c023fb

Comment by Githook User [ 18/Aug/22 ]

Author:

{'name': 'Pierlauro Sciarelli', 'email': 'pierlauro.sciarelli@mongodb.com', 'username': 'pierlauro'}

Message: SERVER-67385 Range deletion tasks on primary must not be scheduled before ongoing queries finish

(cherry picked from commit 32c2f632eaa7bf80607880162ec5e4eaeb22d7fe)
Branch: v6.0
https://github.com/mongodb/mongo/commit/506c10404af0030d7fd6022ba76d34b5ad01cbae

Comment by Garaudy Etienne [ 27/Jul/22 ]

I thought I had filed it! lol. It's here now. marked the docs ticket as related. DOCSP-24024

Comment by Garaudy Etienne [ 12/Jul/22 ]

garaudy.etienne@mongodb.com to remove the "or for in-progress queries involving the chunk to complete on the shard primary, whichever is longer" part of the docs. 

Generated at Thu Feb 08 06:08:00 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.