Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-83879

Command profiling tests fail due to interferece with fcv_upgrade_downgrade_replica_sets_jscore_passthrough

    • Type: Icon: Bug Bug
    • Resolution: Fixed
    • Priority: Icon: Major - P3 Major - P3
    • 8.0.0-rc0, 7.3.0-rc2
    • Affects Version/s: 7.3.0-rc0
    • Component/s: None
    • Labels:
      None
    • Cluster Scalability
    • Fully Compatible
    • ALL
    • v7.3
    • 35

      During the investigating of BF-30194, a profile_lookup.js failure which happens when running as part of  fcv_upgrade_downgrade_replica_sets_jscore_passthrough, it was uncovered that commands other than setFeatureCompatibilityVersion can be emitted implicitly as part of FCV upgrade/downgrade procedures.

      Tests marked as {}require_profiling are already excluded from fcv_upgrade_downgrade_sharding_jscore_passthrough suites due to mongos lacking the system.profile collection, so this could only happen on fcv_upgrade_downgrade_replica_sets_jscore_passthrough

       

      For BF-30194 in particular, an unaccounted collMod command happens during the time profiling is enabled which increments the command counter on the collection targeted by the test. This later  

      Looking at the latest available evergreen failure at the moment of investigating, we can reconstruct the following timeline from the logs:

      • [CODE] Profiler gets enabled with the filter "filter":{"command.setFeatureCompatibilityVersion":{"$not":{"$exists":true}},"db":"test"}}}}
      [j2:prim] {"t":
      {"$date":"2023-11-29T09:30:37.095+00:00"}
      ,"s":"I", "c":"COMMAND", "id":48742, "ctx":"conn320","msg":"Profiler settings changed","attr":{"from":
      {"level":0,"slowms":100,"sampleRate":1}
      ,"to":{"level":1,"slowms":100,"sampleRate":1,"filter":{"command.setFeatureCompatibilityVersion":{"$not":{"$exists":true}}}},"db":"test"}}

       

      •  [Problem] A collMod on collection foreign happens right after
      [j2:prim] {"t":
      {"$date":"2023-11-29T09:30:37.103+00:00"}
      ,"s":"I", "c":"COMMAND", "id":5324200, "ctx":"conn319","msg":"CMD: collMod","attr":{"cmdObj":
      {"collMod":"foreign"}
      }}

       

      • [CODE] The test query containing $lookup executes:
      [j2:prim] {"t":
      {"$date":"2023-11-29T09:30:37.103+00:00"}
      ,"s":"I", "c":"COMMAND", "id":51803, "ctx":"conn320","msg":"Slow query","attr":{"type":"command","isFromUserConnection":true,"ns":"test.local","collectionType":"normal","appName":"MongoDB Shell","command":{"aggregate":"local","pipeline":\\\\\{"$lookup":{"from":"foreign","as":"res","localField":"a","foreignField":"a"}},"cursor":{},"lsid":{"id":{"$uuid":"d493d41d-7190-4a35-9a12-165c9f1cd823"}},"$clusterTime":{"clusterTime":{"$timestamp":{"t":1701250236,"i":10}},"signature":{"hash":{"$binary":{"base64":"AAAAAAAAAAAAAAAAAAAAAAAAAAA=","subType":"0"}},"keyId":0}},"$db":"test"},"planSummary":"COLLSCAN","planningTimeMicros":233,"keysExamined":0,"docsExamined":4,"cursorExhausted":true,"numYields":0,"nreturned":3,"queryHash":"9EB1333D","planCacheKey":"9B882338","queryFramework":"sbe","reslen":398,"locks":{"FeatureCompatibilityVersion":{"acquireCount":{"r":1}},"Global":{"acquireCount":
      {"r":1}
      }},"readConcern":{"level":"local","provenance":"implicitDefault"},"writeConcern":{"w":"majority","wtimeout":0,"provenance":"implicitDefault"},"storage":{},"remote":"127.0.0.1:58381","protocol":"op_msg","durationMillis":4}}

       

      • [CODE] Some time later the profiler gets sopped
      [j2:prim] {"t":
      {"$date":"2023-11-29T09:30:37.116+00:00"}
      ,"s":"I", "c":"COMMAND", "id":48742, "ctx":"conn320","msg":"Profiler settings changed","attr":{"from":{"level":1,"slowms":100,"sampleRate":1,"filter":{"command.setFeatureCompatibilityVersion":{"$not":{"$exists":true}}}},"to":{"level":0,"slowms":100,"sampleRate":1,"filter":{"command.setFeatureCompatibilityVersion":{"$not":{"$exists":true}}}},"db":"test"}}

       

      • [CODE] Then the test fails on the foreign command count assertion:
      [js_test:profile_lookup] 2023-11-29T09:30:37.137Z assert: [1] != [2] are not equal
      [js_test:profile_lookup] doassert@src/mongo/shell/assert.js:20:14
      [js_test:profile_lookup] assert.eq@src/mongo/shell/assert.js:176:17
      [js_test:profile_lookup] @jstests\aggregation\sources\lookup\profile_lookup.js:57:8
      [js_test:profile_lookup] uncaught exception: Error: [1] != [2] are not equal :
      [js_test:profile_lookup] doassert@src/mongo/shell/assert.js:20:14
      [js_test:profile_lookup] assert.eq@src/mongo/shell/assert.js:176:17
      [js_test:profile_lookup] @jstests\aggregation\sources\lookup\profile_lookup.js:57:8
      [js_test:profile_lookup] Error: [1] != [2] are not equal
      [js_test:profile_lookup] failed to load: jstests\aggregation\sources\lookup\profile_lookup.js
      [js_test:profile_lookup] exiting with code -3

       

      The test assumed that the pipeline will be the sole command involving collection foreign between the profiler start / stop calls, and therefore expected the command count on foreign to be equal to 1 (classic engine). This however wasn't the case due to the additional `collMod` command.

            Assignee:
            adi.zaimi@mongodb.com Adi Zaimi
            Reporter:
            catalin.sumanaru@mongodb.com Catalin Sumanaru
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: