[SERVER-26083] create library of basic ops that can be run against a cluster under different conditions Created: 13/Sep/16  Updated: 23/Nov/16  Resolved: 23/Nov/16

Status: Closed
Project: Core Server
Component/s: Testing Infrastructure
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Esha Maharishi (Inactive) Assignee: DO NOT USE - Backlog - Test Infrastructure Group (TIG)
Resolution: Won't Fix Votes: 1
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
is related to SERVER-25971 Test performing ops against a 3.4 sha... Closed
Sprint: Sharding 2016-11-21
Participants:

 Description   

This is similar in nature to two things we do already:

1) the concurrency suite
2) passthrough suites, including the stepdown and last-stable suites

However, 1) doesn't give control over what is being run concurrently with the ops, and 2) is a heavyweight approach that requires blacklisting existing tests that don't work with the passthrough and considering whether each new test should be blacklisted.

I'd instead like a simple js lib containing something like:

var testCRUD = function(coll) { ... }
var testAgg = function(coll) { ... }
var testCount = function(coll) { ... }
var testDistinct = function(coll) { ... }
var testGroup = function(coll) { ... }
var testFindAndModify = function(coll) { ... }
var testMapReduce = function(db) { ... }

which can be load()'ed in a particular jstest. The jstest can then configure the cluster in any way desired (take down some nodes, slow down replication, drop messages, exercise the balancer, delete administrative sharding docs (e.g. shardIdentity, cluster version)) or the collection in any way desired (sharded or unsharded, with or without particular indexes) before running any subset (or all) of the functions in the lib.

This would be useful because we re-write these kinds of functions often in many jstests, which is time-consuming and more likely to have bugs.

A nice example the library could be based on is https://github.com/mongodb/mongo/blob/r3.3.12/jstests/sharding/collation_targeting.js, which tess.avitabile wrote for 3.4.



 Comments   
Comment by Max Hirschhorn [ 23/Nov/16 ]

To summarize my viewpoint: I think using a whitelist for choosing what operations are executed against a cluster can lead to scenarios where new functionality doesn't get tested under all configurations we're interested in testing. I would rather focus our efforts in making the blacklisting process easier if that is where the most pain with the current workflow exists. However, it didn't seem like we were able to reach an understanding on what improvements to blacklisting would help you accomplish the same goals. I'm closing this ticket as "Won't fix", but if there are ideas for making the blacklisting process easier or answers regarding the questions I asked previously, then we can revisit this ticket or address them under a new one.

Comment by Max Hirschhorn [ 21/Nov/16 ]

What I'd like is really the inverse: to add new cluster configurations and immediately test them over a pre-defined set of features/functionality.

I'm failing to grasp the benefit of having a pre-defined set of features/functionality continuously being tested in Evergreen other than "blacklisting requires effort"; I think we'd be doing our future selves a disservice if we added tests for different cluster configurations and didn't automatically test new features/functionality under those cluster configurations.

Would having a pre-defined set of features/functionality to test against the new cluster configuration be useful for prototyping the new cluster configuration?

I'd appreciate it if you would answer the specific questions I asked in my previous comment to help me better understand what other possible changes we could make to advance towards the same goal.

Currently we do it by adding a new jstest with a cluster configured the new way and manually re-writing (or copy-pasting from other tests) code to run a set of desired basic ops under that configuration.

Some examples are:

Does the testCRUD() function in the upgrade_cluster*.js tests act as anything more than a basic sanity check? Is there anything specific about the multiversion tests that makes blacklisting tests from the jstests/core/ directory or elsewhere more difficult?

I should also mention that without being able to transfer ownership of the cluster (a potential outcome of SERVER-21774) and teaching resmoke.py how to run older version of MongoDB (a potential side-benefit of SERVER-21774 depending on the approach taken), it isn't possible to write a resmoke.py suite that launches a mixed-version cluster and runs existing tests against it. I think it is likely more work for the TIG team to enable writing tests in that fashion; however, I think it worthwhile in its own right and would also bring us to a more consistent state with other resmoke.py suites in terms of how they are written and their reporting in Evergreen.

Comment by Esha Maharishi (Inactive) [ 15/Nov/16 ]

These are good points. I think the crux of the issue is:

The jstests/core directory allows us to add new features/functionality to mongod and immediately test them under a pre-defined set of cluster configurations, which are defined by the suites that run jstests/core. (We also do something similar with the jstests/sharding directory, with the continuous_stepdown and last_stable suites).

What I'd like is really the inverse: to add new cluster configurations and immediately test them over a pre-defined set of features/functionality.

We could arguably do this by adding a new suite for each new configuration, but it's difficult to stabilize a new suite, because it requires weeding out and blacklisting all tests that aren't compatible for one reason or another.

Currently we do it by adding a new jstest with a cluster configured the new way and manually re-writing (or copy-pasting from other tests) code to run a set of desired basic ops under that configuration.

Some examples are:

Comment by Max Hirschhorn [ 15/Nov/16 ]

This is similar in nature to two things we do already:

1) the concurrency suite
2) passthrough suites, including the stepdown and last-stable suites

However, 1) doesn't give control over what is being run concurrently with the ops, and 2) is a heavyweight approach that requires blacklisting existing tests that don't work with the passthrough and considering whether each new test should be blacklisted.

FWIW, #1 also requires blacklisting tests - both in FSM runners such as fsm_all_sharded_replication.js and in other users of FSM workloads such as backup_restore.js.

I'm not sure I understand what control you want to have over the operations being run in the concurrency suite. Is it about exposing more configuration options to the cluster started by the concurrency framework?

As of the writing of this comment, we have over 15 suites that reuse tests from the jstests/core/ directory. This means that there is a large force multiplier any time a developer adds a new test to the jstests/core/ directory beause the test will automatically be run under many different configurations. Each new configuration we add to reuse our existing tests strengthens the benefit of doing so. Additionally, having new tests automatically run under these different configurations means that new features and/or changes in behaviors are automatically tested under those configurations. I'm reluctant to introduce yet another place to define test cases for building up more complex test suites.

To the other part of your remark, enabling a new suite requires effort and given that there may be implicit assumptions in the tests about the cluster's configuration or regarding the consistency in behavior of mongod and mongos, I don't imagine that ever going away entirely. Perhaps some of that work could be lessened if we attempted to use tags for maintaining the blacklist of a resmoke.py suite; for example, the blacklists of the sharding_jscore_passthrough suite and the sharded_collections_jscore_passthrough suite could be heavily consolidated if we did SERVER-18395. Would making blacklisting tests easier address your motivation for filing this ticket?

If not, are you dissatisfied by the quality of the tests in the jstests/core/ directory? Or other tests that we reuse for that matter?

The jstest can then configure the cluster in any way desired (take down some nodes, slow down replication, drop messages, exercise the balancer, delete administrative sharding docs (e.g. shardIdentity, cluster version)) or the collection in any way desired (sharded or unsharded, with or without particular indexes) before running any subset (or all) of the functions in the lib.

What would you say prevents such a suite from being written today by defining a new resmoke.py suite and reusing existing tests in the jstests/core/ directory, FSM workloads in the jstests/concurrency/fsm_workloads/ directory, etc.?

Generated at Thu Feb 08 04:11:06 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.