[SERVER-83879] Command profiling tests fail due to interferece with fcv_upgrade_downgrade_replica_sets_jscore_passthrough Created: 05/Dec/23  Updated: 26/Jan/24  Resolved: 26/Jan/24

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 7.3.0-rc0
Fix Version/s: 8.0.0-rc0, 7.3.0-rc2

Type: Bug Priority: Major - P3
Reporter: Catalin Sumanaru Assignee: Adi Zaimi
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Fix
Related
related to SERVER-81563 Remove profiling tests from fcv upgra... Closed
Assigned Teams:
Cluster Scalability
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v7.3
Participants:
Linked BF Score: 35

 Description   

During the investigating of BF-30194, a profile_lookup.js failure which happens when running as part of  fcv_upgrade_downgrade_replica_sets_jscore_passthrough, it was uncovered that commands other than setFeatureCompatibilityVersion can be emitted implicitly as part of FCV upgrade/downgrade procedures.

Tests marked as {}require_profiling are already excluded from fcv_upgrade_downgrade_sharding_jscore_passthrough suites due to mongos lacking the system.profile collection, so this could only happen on fcv_upgrade_downgrade_replica_sets_jscore_passthrough

 

For BF-30194 in particular, an unaccounted collMod command happens during the time profiling is enabled which increments the command counter on the collection targeted by the test. This later  

Looking at the latest available evergreen failure at the moment of investigating, we can reconstruct the following timeline from the logs:

  • [CODE] Profiler gets enabled with the filter "filter":{"command.setFeatureCompatibilityVersion":{"$not":{"$exists":true}},"db":"test"}}}}
[j2:prim] {"t":
{"$date":"2023-11-29T09:30:37.095+00:00"}
,"s":"I", "c":"COMMAND", "id":48742, "ctx":"conn320","msg":"Profiler settings changed","attr":{"from":
{"level":0,"slowms":100,"sampleRate":1}
,"to":{"level":1,"slowms":100,"sampleRate":1,"filter":{"command.setFeatureCompatibilityVersion":{"$not":{"$exists":true}}}},"db":"test"}}

 

  •  [Problem] A collMod on collection foreign happens right after
[j2:prim] {"t":
{"$date":"2023-11-29T09:30:37.103+00:00"}
,"s":"I", "c":"COMMAND", "id":5324200, "ctx":"conn319","msg":"CMD: collMod","attr":{"cmdObj":
{"collMod":"foreign"}
}}

 

  • [CODE] The test query containing $lookup executes:
[j2:prim] {"t":
{"$date":"2023-11-29T09:30:37.103+00:00"}
,"s":"I", "c":"COMMAND", "id":51803, "ctx":"conn320","msg":"Slow query","attr":{"type":"command","isFromUserConnection":true,"ns":"test.local","collectionType":"normal","appName":"MongoDB Shell","command":{"aggregate":"local","pipeline":\\\\\{"$lookup":{"from":"foreign","as":"res","localField":"a","foreignField":"a"}},"cursor":{},"lsid":{"id":{"$uuid":"d493d41d-7190-4a35-9a12-165c9f1cd823"}},"$clusterTime":{"clusterTime":{"$timestamp":{"t":1701250236,"i":10}},"signature":{"hash":{"$binary":{"base64":"AAAAAAAAAAAAAAAAAAAAAAAAAAA=","subType":"0"}},"keyId":0}},"$db":"test"},"planSummary":"COLLSCAN","planningTimeMicros":233,"keysExamined":0,"docsExamined":4,"cursorExhausted":true,"numYields":0,"nreturned":3,"queryHash":"9EB1333D","planCacheKey":"9B882338","queryFramework":"sbe","reslen":398,"locks":{"FeatureCompatibilityVersion":{"acquireCount":{"r":1}},"Global":{"acquireCount":
{"r":1}
}},"readConcern":{"level":"local","provenance":"implicitDefault"},"writeConcern":{"w":"majority","wtimeout":0,"provenance":"implicitDefault"},"storage":{},"remote":"127.0.0.1:58381","protocol":"op_msg","durationMillis":4}}

 

  • [CODE] Some time later the profiler gets sopped
[j2:prim] {"t":
{"$date":"2023-11-29T09:30:37.116+00:00"}
,"s":"I", "c":"COMMAND", "id":48742, "ctx":"conn320","msg":"Profiler settings changed","attr":{"from":{"level":1,"slowms":100,"sampleRate":1,"filter":{"command.setFeatureCompatibilityVersion":{"$not":{"$exists":true}}}},"to":{"level":0,"slowms":100,"sampleRate":1,"filter":{"command.setFeatureCompatibilityVersion":{"$not":{"$exists":true}}}},"db":"test"}}

 

  • [CODE] Then the test fails on the foreign command count assertion:
[js_test:profile_lookup] 2023-11-29T09:30:37.137Z assert: [1] != [2] are not equal
[js_test:profile_lookup] doassert@src/mongo/shell/assert.js:20:14
[js_test:profile_lookup] assert.eq@src/mongo/shell/assert.js:176:17
[js_test:profile_lookup] @jstests\aggregation\sources\lookup\profile_lookup.js:57:8
[js_test:profile_lookup] uncaught exception: Error: [1] != [2] are not equal :
[js_test:profile_lookup] doassert@src/mongo/shell/assert.js:20:14
[js_test:profile_lookup] assert.eq@src/mongo/shell/assert.js:176:17
[js_test:profile_lookup] @jstests\aggregation\sources\lookup\profile_lookup.js:57:8
[js_test:profile_lookup] Error: [1] != [2] are not equal
[js_test:profile_lookup] failed to load: jstests\aggregation\sources\lookup\profile_lookup.js
[js_test:profile_lookup] exiting with code -3

 

The test assumed that the pipeline will be the sole command involving collection foreign between the profiler start / stop calls, and therefore expected the command count on foreign to be equal to 1 (classic engine). This however wasn't the case due to the additional `collMod` command.



 Comments   
Comment by Githook User [ 26/Jan/24 ]

Author:

{'name': 'Adi Zaimi', 'email': 'adizaimi@yahoo.com', 'username': 'adizaimi'}

Message: SERVER-83879: Omit running profile_lookup.js in fcv up/down suite (#18267)

SERVER-83879: Omit running profile_lookup.js in fcv up/down suite
GitOrigin-RevId: a9e2a6516cfb9da91262c1b82c0c254539d27749
Branch: v7.3
https://github.com/mongodb/mongo/commit/3debbda0ff675fde1f80099d7e994f70b0757d78

Comment by Githook User [ 25/Jan/24 ]

Author:

{'name': 'Adi Zaimi', 'email': 'adizaimi@yahoo.com', 'username': 'adizaimi'}

Message: SERVER-83879: Omit running profile_lookup.js in fcv up/down suite (#18267)

SERVER-83879: Omit running profile_lookup.js in fcv up/down suite
GitOrigin-RevId: 0b80a1d04f647c056d160fd7c563c3b33ecb594c
Branch: master
https://github.com/mongodb/mongo/commit/85f557da72250d0c384c20439e5872569b6d9ea9

Comment by Huayu Ouyang [ 05/Dec/23 ]

catalin.sumanaru@mongodb.com I'm happy to work on it, but I believe this passthrough suite is actually owned by the Cluster Scalability team (I only worked on the previous ticket since it was a quick fix)

Generated at Thu Feb 08 06:53:25 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.