[SERVER-77895] fsm_workloads /agg_match.js fails due to dropped output collection Created: 07/Jun/23  Updated: 29/Oct/23  Resolved: 12/Jun/23

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 7.1.0-rc0

Type: Bug Priority: Major - P3
Reporter: Benety Goh Assignee: Henrik Edin
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Related
is related to SERVER-76839 Operations with a CollectionCatalog i... Closed
Assigned Teams:
Storage Execution
Backwards Compatibility: Fully Compatible
Operating System: ALL
Sprint: Execution NAMR Team 2023-06-26
Participants:
Linked BF Score: 156

 Description   

The FSM workload agg_match.js verifies the results of the aggregation by [checking the number of documents in the output collection https://github.com/mongodb/mongo/blob/72ac0d6a11f52f91736840d22b797f9cdd8d2d7f/jstests/concurrency/fsm_workloads/agg_match.js#L24. It is expecting the output collection to contain half as many documents as the source collection, but we sometimes see failures in our CI system, especially in the concurrency suites, where the output collection document count is reported as zero.

====== OLD DESCRIPTION =====
OLD TITLE: fsm_workloads /agg_match.js fails due to fast count

The FSM workload agg_match.js uses fast count to check the collection. This is not as robust as using aggregation itself (especially given this test is in the aggregation test suite) to retrieve the collection count (using db.collection.countDocuments()). It would also be less confusing for build failures in the aggregation test suites to be failing due to fast count inaccuracies.



 Comments   
Comment by Githook User [ 11/Jun/23 ]

Author:

{'name': 'Henrik Edin', 'email': 'henrik.edin@mongodb.com', 'username': 'henrikedin'}

Message: SERVER-77895 Fix window of inconsistency with collection drop

Make sure that commit handlers that add idents for pending drop to the reaper execute before drops are made visible in the catalog. If this executes in the wrong order a reader could fail to find the collection in both the reaper and catalog.

This is intended as a temporary fix until we have a long-term solution for our atomicity issues with commit handlers
Branch: master
https://github.com/mongodb/mongo/commit/667d1b7de618b451f8a4ad7b3b7c56a4c985f16b

Comment by Henrik Edin [ 09/Jun/23 ]

The issue is indeed caused by SERVER-76839. Due to the change in the orderĀ onCommit handlers execute there is now a window where a drop has been made visible in the CollectionCatalog but the ident has not yet been added to the reaper. A reader that observes this state believes that the reaper has already reaped the ident which we treat as a non-existing collection.

Comment by Max Hirschhorn [ 08/Jun/23 ]

The agg_match.js FSM workload runs a $out aggregation which internally uses renameCollection. Renaming a collection is meant to be an atomic operation. In that sense I would expect checking the fast-count for a collection to always see either (a) the fast-count of collection instance A or (b) the fast-count of collection instance B. The system reporting fast-count == 0 breaks the atomicity guarantee given that neither the source nor target collections were empty.

Changing the agg_match.js FSM workload to do a COUNT_SCAN and seeing it fail doesn't seem exactly related here because it is known for query cursors to not survive renameCollection (SERVER-31695).

Generated at Thu Feb 08 06:36:54 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.