[SERVER-74969] [v6.3] [BF-27740] Do not run view_catalog_cycle_lookup.js with balancer Created: 16/Mar/23  Updated: 24/Aug/23  Resolved: 17/Mar/23

Status: Closed
Project: Core Server
Component/s: Testing Infrastructure
Affects Version/s: 6.3.0-rc1
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Kevin Cherkauer Assignee: Kevin Cherkauer
Resolution: Declined Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Operating System: ALL
Participants:

 Description   

Mark jstests/concurrency/fsm_workloads/view_catalog_cycle_lookup.js, with the tag assumes_balancer_off directly in v6.3 to fix BF-27740. This will return the failing test (concurrency/fsm_workloads/view_catalog_cycle_lookup.js) to its prior status quo of not running with balancer enabled. (The YAML files were changed to allow this test to run with balancer enabled by SERVER-73385 in v6.3 on 2023-02-07, the first day BF-27740 started failing.)

The root cause of the BF failure (SERVER-74380) was apparently already fixed in v7.0 by some other ticket unknown to me – see https://jira.mongodb.org/browse/SERVER-74380?focusedCommentId=5279448&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-5279448 for details.



 Comments   
Comment by Kevin Cherkauer [ 17/Mar/23 ]

This ticket resulted from a misreading of the BF info dump. The problem occurs on master as well as v6.3. Closing this ticket and resurrecting SERVER-74380.

Comment by Kevin Cherkauer [ 17/Mar/23 ]

In v6.3 the entire directory of tests fsm_workloads/* was enabled to run with balancer ON by SERVER-73385, but this same commit explicitly added the assumes_balancer_off tag to fsm_workloads/rename_sharded_collection.js to continue preventing it from running with balancer on. This delivery apparently caused fsm_workloads/view_catalog_cycle_locking.js to start failing (this is BF-27740) when run with balancer on (since it had not previously run with balancer on, but the delivery made it start to run with it on).

In v7.0 now, the assume_balancer_off tag has been removed from rename_sharded_collection.js, so whatever balancer problem prevented it from running has been fixed in 7.0. Meanwhile view_catalog_cycle_locking.js never had the assume_balancer_off tag.

There are two kinds of fixes to think about:

1. Fix the BF (i.e. nightly test run failure) in v6.3. This means just add the assumes_balancer_off tag to view_catalog_cycle_locking.js.

2. Fix the root cause in v6.3. Since I don't know what the root cause fix in v7.0 was, I can't backport it.

Given that this was apparently a Day One bug that we lived with for years without problems, just doing #1 (fix the BF in v6.3) seems like it should be enough. The nightly test will stop failing, and we already have a root cause fix in v7.0, and the bug didn't actually bother anyone in pre-7.0 until someone enabled the test to run with balancer on, so fixing the BF in v6.3 should be all we need to do.

Generated at Thu Feb 08 06:29:00 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.