[SERVER-70252] Tag additional plan cache related tests as 'tenant_migration_incompatible' Created: 05/Oct/22  Updated: 29/Oct/23  Resolved: 17/Oct/22

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 6.2.0-rc0

Type: Task Priority: Major - P3
Reporter: David Storch Assignee: Mathis Bessa
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Related
is related to SERVER-70183 sbe_plan_cache_autoparameterize_colls... Closed
Backwards Compatibility: Fully Compatible
Sprint: Server Serverless 2022-10-17, Server Serverless 2022-10-31
Participants:
Linked BF Score: 35

 Description   

In SERVER-70183 we identified a plan cache test (sbe_plan_cache_autoparameterize_collscan.js) which was failing in the shard_merge_jscore_passthrough suite. Although we haven't seen this test fail in other tenant migration related passthrough suites in Evergreen, I was able to reproduce a failure locally by applying the following patch:

diff --git a/buildscripts/resmokeconfig/suites/tenant_migration_jscore_passthrough.yml b/buildscripts/resmokeconfig/suites/tenant_migration_jscore_passthrough.yml
index b026e57b6ac..4432cd4adda 100644
--- a/buildscripts/resmokeconfig/suites/tenant_migration_jscore_passthrough.yml
+++ b/buildscripts/resmokeconfig/suites/tenant_migration_jscore_passthrough.yml
@@ -93,7 +93,7 @@ executor:
         enableTestCommands: 1
         failpoint.abortTenantMigrationBeforeLeavingBlockingState:
             mode:
-              activationProbability: 0.5
+              activationProbability: 0.0
         failpoint.pauseTenantMigrationBeforeLeavingBlockingState:
             mode: alwaysOn
             data:
diff --git a/jstests/core/sbe_plan_cache_autoparameterize_collscan.js b/jstests/core/sbe_plan_cache_autoparameterize_collscan.js
index 351a9a09ce3..fe88699409e 100644
--- a/jstests/core/sbe_plan_cache_autoparameterize_collscan.js
+++ b/jstests/core/sbe_plan_cache_autoparameterize_collscan.js
@@ -110,6 +110,7 @@ function runTest(shape1, expectedResults1, shape2, expectedResults2, sameCacheKe
     // Run each query twice in order to make sure that each query still returns the same results
     // after the state of the cache has been altered.
     [...Array(2)].forEach(() => {
+        sleep(1000);
         const actualResults1 = runFindCommandFromShapeDoc(shape1);
         assert.sameMembers(actualResults1, expectedResults1, shape1);
         assertSbePlanCacheEntryExists(cacheKey1);

And then running the test under the tenant_migration_jscore_passthrough suite with a command like this:

python3 buildscripts/resmoke.py run --installDir=build/install/bin --additionalFeatureFlags=featureFlagSbeFull --suites=tenant_migration_jscore_passthrough jstests/core/sbe_plan_cache_autoparameterize_collscan.js --repeat=5 | tee foo.log

The solution was to tag the test as "tenant_migration_incompatible". Since the plan cache state is local to a particular mongod node, any plan cache test that explicitly interrogates the plan cache using something like $planCacheStats could hypothetically fail in the tenant migration, shard split, or shard merge passthroughs. The test could generate some mongod-local plan cache state, but then if a tenant migration commits, the test will begin communicating with a new node that has a cold plan cache without the expected state.

The work for this ticket is to identify plan cache tests which 1) are currently running the tenant migration, shard merge, or shard split passthroughs, and 2) which we believe could fail in these passthroughs if a tenant migration committed at an inopportune time. Any such test should be tagged "tenant_migration_incompatible" to prevent future build failures.

With a quick search using the build baron tool, I did find a couple BFGs that might fit under the above rubric:



 Comments   
Comment by Githook User [ 17/Oct/22 ]

Author:

{'name': 'mathisbessamdb', 'email': 'mathis.bessa@mongodb.com', 'username': 'mathisbessamdb'}

Message: SERVER-70252 Tag additional plan cache related tests as 'tenant_migration_incompatible'
Branch: master
https://github.com/mongodb/mongo/commit/f9b0ffa6d8f6b7ceb8262256ef9b1840a10674f3

Comment by Mathis Bessa [ 10/Oct/22 ]

The test that was failing has already been fixed in SERVER-70183 which lowers the urgency of this ticket, however we would like to be pro-active and mark other tests tenant_migration_incompatible.

The tests are already marked as "does_not_support_stepdowns" which is not sufficient for tenant_migration_passthroughs (and other migration passthrough such as shard_merge and shard_split) therefore we need to be consistent and mark other tests "tenant_migration_incompatible" as well.

Comment by Steven Vannelli [ 10/Oct/22 ]

Thanks for the ticket david.storch@mongodb.com. We're going to put this in the backlog for now since the immediate BF has been resolved. If this requires sooner attention, let us know.

mathis.bessa@mongodb.com to add some more context on this ticket.

Generated at Thu Feb 08 06:15:41 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.