[SERVER-47136] Multiversion blacklisting not being respected by burn in tests Created: 26/Mar/20  Updated: 29/Oct/23  Resolved: 06/May/20

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 4.7.0

Type: Bug Priority: Major - P3
Reporter: Alexander Costas (Inactive) Assignee: Jason Chan
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
depends on SERVER-47137 Separate non-passthrough multiversion... Closed
Related
related to SERVER-48048 Use resmoke tag files for multiversio... Closed
related to SERVER-47965 Remove multiverison blacklisting from... Closed
Backwards Compatibility: Fully Compatible
Backport Requested:
v4.4
Sprint: Repl 2020-05-18
Participants:

 Description   

Currently, there is a check whether multiversion testing is configured when filtering out tests based on blacklist in burn_in_test. Multiversion testing is always not configured for this code path, so this filtering never happens.

As a DAG engineer,
I'd like to fix this, such that, engineers can blacklist passthrough multiversion tests using the etc/backports_required_for_multiversion_tests.yml and have them excluded from running.

AC:

  • Allow filtering of non-passthrough multiversion tests in burn_in_test


 Comments   
Comment by Githook User [ 06/May/20 ]

Author:

{'name': 'Jason Chan', 'email': 'jason.chan@10gen.com', 'username': 'jasonjhchan'}

Message: SERVER-47136 Fix multiversion blacklisting for replica_sets_multiversion and sharding_multiversion
Branch: master
https://github.com/mongodb/mongo/commit/bfaebfa7b5ad725717d1d348ae33bc758028f697

Comment by Robert Guo (Inactive) [ 06/May/20 ]

We will discuss SERVER-47965 at our next triage meeting and try to schedule it for shortly after the release. While the workflow for blacklisting tests across versions is certainly clunky, it is not very different to how all other blacklists in resmoke works (for backports that are not pushed concurrently with the commit on master). The reason is that the blacklist goes in the resmoke YAML suite config, not in the separate backports_required... file. So any time the suite definition is updated, people will likely see any existing TODOs and possibly check whether they can be resolved, which is identical to how all other resmoke suites works.

Again, I'm not trying to redeem the backporting and "forwardporting" experience of blacklists, but the risk of additional lost test coverage seems small to me.

Comment by David Bradford (Inactive) [ 06/May/20 ]

My understanding is the same as Tess's if that's correct, when can we schedule SERVER-47965? I think that having to do another commit on master after backport is an undesirable developer experience that will lead to a lot of tests not getting unblacklisted accidentally.

robert.guo Do you have an idea when SERVER-47965 might get scheduled?

Comment by Judah Schvimer [ 06/May/20 ]

My understanding is the same as Tess's if that's correct, when can we schedule SERVER-47965? I think that having to do another commit on master after backport is an undesirable developer experience that will lead to a lot of tests not getting unblacklisted accidentally.

Comment by Tess Avitabile (Inactive) [ 06/May/20 ]

To check that I understand correctly, does this mean that after backporting, we will need to do another commit on master to remove the test form the exclude_files keys in replica_sets_multiversion.yml and sharding_multiversion.yml?

Comment by Jason Chan [ 05/May/20 ]

As part of this ticket we decided for a quick fix to stop the bleeding:

  • replica_sets_multiversion and sharding_multiversion will no longer be using etc/backports_required_for_multiversion_tests.yml as a blacklisting mechanism to refrain from cluttering burn_in_tests.py with this logic. Instead, we will direct users to add any tests that need to be blacklisted under the exclude_files keys in replica_sets_multiversion.yml and sharding_multiversion.yml. This way, burn_in will rely on existing resmoke logic to detect that these tests should not be run.
Comment by David Bradford (Inactive) [ 05/May/20 ]

The problem with the known quick fix is that it breaks burn_in_tests in a more general way. See SERVER-46914.

Comment by Judah Schvimer [ 30/Apr/20 ]

I'll jump in here to second Jason's suggestion for a quick temporary fix. I think this is likely causing a lot of confusion for developers in patch builds and I'd be disappointed if we delayed a fix too long.

Comment by David Bradford (Inactive) [ 30/Apr/20 ]

I've set up a meeting for next week to discuss what we want to do here.

Comment by Jason Chan [ 30/Apr/20 ]

david.bradfordThat makes sense. I'm hoping we can at least get a quick temporary fix for now to reduce redness in people's patch builds.

Comment by David Bradford (Inactive) [ 30/Apr/20 ]

I've been holding off on this because it seems like the wrong approach. burn_in_tests shouldn't be doing any blacklisting due to multiversion. Tests should likely be blacklisted by resmoke, likely as a selector (see here). That would allow us to not need to blacklist the tests in multiple places.

Comment by Jason Chan [ 30/Apr/20 ]

alexander.costasDo we have an expected timeline for this? We initially had the multiversion test suites disabled for the Upgrade/Downgrade project, but have recently re-enabled them. I expect this to cause some noise on the master branch.

Comment by Jason Chan [ 13/Apr/20 ]

Currently, the other blacklisting logic is defined here in evergreen_gen_multiversion_tests.py. This script is called in three places:

1. As part of generate randomized multiversion tasks.
2. As part of generate multiversion tasks
3. As part of the replica_sets_multiversion task

  • Note: replica_sets_multiversion is a weird one because it's based on the replica_sets task, which was not yet a generated task at the point of the Mixed version testing project. Therefore, replica_sets_multiversion does not call into the functions in 1 and 2 so it required it's own invocation to the blacklist logic.

If we move the logic from evergreen_gen_multiversion_tests.py or change the invocation, we should update the places mentioned above.

However, the logic is a little different between the above and the logic in burn_in_tests so I don't think we can merge the two completely. Currently, the above logic will update the generated yaml files directly to blacklist the files that should be excluded (by redefining the the exclude_files and selector values in the yaml definition. IIRC, in the burn_in_tests, we used to just skip adding the blacklisted task to the task list.

Comment by Alexander Costas (Inactive) [ 13/Apr/20 ]

Makes sense to me. jason.chan This would apply to the other script as well, right? If we make resmoke handle this, we can remove the blacklisting from there.

Comment by David Bradford (Inactive) [ 13/Apr/20 ]

After thinking through this, burn_in seems like the wrong place to be doing this blacklisting. The purpose of burn_in is to ensure that tests provide consistent results. By blacklisting tests in burn_in, we are effectively saying "These tests do not provide consistent results", which doesn't seem right.

This blacklisting seems like it is meant to communicate we shouldn't run this test at all until the specified ticket has been backported to the previous mongo version. If that is the case, then resmoke seems like it would be a better place for this logic, since we wouldn't want to run the tests regardless of whether they are being run as part of burn_in or as part of a normal run of the suite.

Comment by Alexander Costas (Inactive) [ 10/Apr/20 ]

david.bradford@mongodb.com So here's the ticket for restoring blacklisting. With two amendments:

1. These are also passthrough tests.
2. Multiversion setup is sometimes active for this scripts; we should detect it and do blacklisting when active.

Right?

Comment by Jason Chan [ 26/Mar/20 ]

For more context:

This is for the multiversion passthroughs that were created as part of the Mixed Version Replica Sets Testing project.

In reference to burn in tests, these passthrough tests can be separated into two categories:

sharding_multiversion and replica_sets_multiversion will be run in the regular burn_in_tests task. Multiversion setup is required to be able to run these tasks in burn_in_tests. For the sake of burn_in_tests, generate_config.use_multiversion will be False even though these are multiversion tests that are meant to be run as part of burn_in.

All other multiversion passthroughs (eg. replica_sets_jscore_multiversion_passthrough) are run as part of the burn_in_multiversion_gen task. These tasks are generated along this codepath in burn_in_tests.py, by indicating that generate_config.use_multiversion is True.

We separate the tests this way from burn_in due to special logic required to generate the multiversion passthroughs in burn_in_multiversion_gen.

The current blacklisting machinery that's broken in question is the etc/backports_required_for_multiversion_tests.yml. We should remove the generate_config.use_multiversion from the conditional check here.

To test that the blacklisting feature is working again, we should do the following:
1. Run a patch with some change to a test in jstests/replsets and a test in jstests/sharding that would normally run in replica_sets_multiversion and sharding_multiversion tasks respectively.
2. Add these tests to the etc/backports_required_for_multiversion_tests.yml file.
These tests should not be picked up by `burn_in_tests` and `burn_in_tags` tasks.

Generated at Thu Feb 08 05:13:24 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.