Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-70252

Tag additional plan cache related tests as 'tenant_migration_incompatible'

    • Type: Icon: Task Task
    • Resolution: Fixed
    • Priority: Icon: Major - P3 Major - P3
    • 6.2.0-rc0
    • Affects Version/s: None
    • Component/s: None
    • Labels:
      None
    • Fully Compatible
    • Server Serverless 2022-10-17, Server Serverless 2022-10-31
    • 35

      In SERVER-70183 we identified a plan cache test (sbe_plan_cache_autoparameterize_collscan.js) which was failing in the shard_merge_jscore_passthrough suite. Although we haven't seen this test fail in other tenant migration related passthrough suites in Evergreen, I was able to reproduce a failure locally by applying the following patch:

      diff --git a/buildscripts/resmokeconfig/suites/tenant_migration_jscore_passthrough.yml b/buildscripts/resmokeconfig/suites/tenant_migration_jscore_passthrough.yml
      index b026e57b6ac..4432cd4adda 100644
      --- a/buildscripts/resmokeconfig/suites/tenant_migration_jscore_passthrough.yml
      +++ b/buildscripts/resmokeconfig/suites/tenant_migration_jscore_passthrough.yml
      @@ -93,7 +93,7 @@ executor:
               enableTestCommands: 1
               failpoint.abortTenantMigrationBeforeLeavingBlockingState:
                   mode:
      -              activationProbability: 0.5
      +              activationProbability: 0.0
               failpoint.pauseTenantMigrationBeforeLeavingBlockingState:
                   mode: alwaysOn
                   data:
      diff --git a/jstests/core/sbe_plan_cache_autoparameterize_collscan.js b/jstests/core/sbe_plan_cache_autoparameterize_collscan.js
      index 351a9a09ce3..fe88699409e 100644
      --- a/jstests/core/sbe_plan_cache_autoparameterize_collscan.js
      +++ b/jstests/core/sbe_plan_cache_autoparameterize_collscan.js
      @@ -110,6 +110,7 @@ function runTest(shape1, expectedResults1, shape2, expectedResults2, sameCacheKe
           // Run each query twice in order to make sure that each query still returns the same results
           // after the state of the cache has been altered.
           [...Array(2)].forEach(() => {
      +        sleep(1000);
               const actualResults1 = runFindCommandFromShapeDoc(shape1);
               assert.sameMembers(actualResults1, expectedResults1, shape1);
               assertSbePlanCacheEntryExists(cacheKey1);
      

      And then running the test under the tenant_migration_jscore_passthrough suite with a command like this:

      python3 buildscripts/resmoke.py run --installDir=build/install/bin --additionalFeatureFlags=featureFlagSbeFull --suites=tenant_migration_jscore_passthrough jstests/core/sbe_plan_cache_autoparameterize_collscan.js --repeat=5 | tee foo.log
      

      The solution was to tag the test as "tenant_migration_incompatible". Since the plan cache state is local to a particular mongod node, any plan cache test that explicitly interrogates the plan cache using something like $planCacheStats could hypothetically fail in the tenant migration, shard split, or shard merge passthroughs. The test could generate some mongod-local plan cache state, but then if a tenant migration commits, the test will begin communicating with a new node that has a cold plan cache without the expected state.

      The work for this ticket is to identify plan cache tests which 1) are currently running the tenant migration, shard merge, or shard split passthroughs, and 2) which we believe could fail in these passthroughs if a tenant migration committed at an inopportune time. Any such test should be tagged "tenant_migration_incompatible" to prevent future build failures.

      With a quick search using the build baron tool, I did find a couple BFGs that might fit under the above rubric:

            Assignee:
            mathis.bessa@mongodb.com Mathis Bessa (Inactive)
            Reporter:
            david.storch@mongodb.com David Storch
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: