[SERVER-77895] fsm_workloads /agg_match.js fails due to dropped output collection Created: 07/Jun/23 Updated: 29/Oct/23 Resolved: 12/Jun/23 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | 7.1.0-rc0 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Benety Goh | Assignee: | Henrik Edin |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||
| Assigned Teams: |
Storage Execution
|
||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||
| Operating System: | ALL | ||||||||||||
| Sprint: | Execution NAMR Team 2023-06-26 | ||||||||||||
| Participants: | |||||||||||||
| Linked BF Score: | 156 | ||||||||||||
| Description |
|
The FSM workload agg_match.js verifies the results of the aggregation by [checking the number of documents in the output collection https://github.com/mongodb/mongo/blob/72ac0d6a11f52f91736840d22b797f9cdd8d2d7f/jstests/concurrency/fsm_workloads/agg_match.js#L24. It is expecting the output collection to contain half as many documents as the source collection, but we sometimes see failures in our CI system, especially in the concurrency suites, where the output collection document count is reported as zero. ====== OLD DESCRIPTION ===== The FSM workload agg_match.js uses fast count to check the collection. This is not as robust as using aggregation itself (especially given this test is in the aggregation test suite) to retrieve the collection count (using db.collection.countDocuments()). It would also be less confusing for build failures in the aggregation test suites to be failing due to fast count inaccuracies. |
| Comments |
| Comment by Githook User [ 11/Jun/23 ] |
|
Author: {'name': 'Henrik Edin', 'email': 'henrik.edin@mongodb.com', 'username': 'henrikedin'}Message: Make sure that commit handlers that add idents for pending drop to the reaper execute before drops are made visible in the catalog. If this executes in the wrong order a reader could fail to find the collection in both the reaper and catalog. This is intended as a temporary fix until we have a long-term solution for our atomicity issues with commit handlers |
| Comment by Henrik Edin [ 09/Jun/23 ] |
|
The issue is indeed caused by |
| Comment by Max Hirschhorn [ 08/Jun/23 ] |
|
The agg_match.js FSM workload runs a $out aggregation which internally uses renameCollection. Renaming a collection is meant to be an atomic operation. In that sense I would expect checking the fast-count for a collection to always see either (a) the fast-count of collection instance A or (b) the fast-count of collection instance B. The system reporting fast-count == 0 breaks the atomicity guarantee given that neither the source nor target collections were empty. Changing the agg_match.js FSM workload to do a COUNT_SCAN and seeing it fail doesn't seem exactly related here because it is known for query cursors to not survive renameCollection (SERVER-31695). |