[SERVER-64665] Investigate and fix test failures at jstest/core and jstests/concurrency/fsm_workloads test suites Created: 18/Mar/22  Updated: 29/Oct/23  Resolved: 08/Apr/22

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 6.0.0-rc0

Type: Task Priority: Major - P3
Reporter: Yoon Soo Kim Assignee: Mihai Andrei
Resolution: Fixed Votes: 0
Labels: read-only-views
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
is duplicated by SERVER-64724 Investigate and fix a test failure at... Closed
Backwards Compatibility: Fully Compatible
Sprint: QE 2022-04-04, QE 2022-04-18
Participants:

 Description   

This issue happens when featureFlagSBELookupPushdown is turned on.

views_validation.js failures

failed test jobs:

  • causally_consistent_hedged_reads_jscore_passthrough
  • causally_consistent_jscore_passthrough_auth
  • causally_consistent_jscore_passthrough
  • causally_consistent_read_concern_snapshot_passthrough
  • sharding_api_version_jscore_passthrough
  • sharding_update_v1_oplog_jscore_passthrough

error messages:

[js_test:views_validation] assert: command did not fail with any of the following codes [ 165 ] {
[js_test:views_validation] 	"ok" : 0,
[js_test:views_validation] 	"errmsg" : "command failed because of stale config :: caused by :: sharding status of collection views_validation.v18 is not currently available for description and needs to be recovered from the config server",
[js_test:views_validation] 	"code" : 13388,
[js_test:views_validation] 	"codeName" : "StaleConfig",
[js_test:views_validation] 	"ns" : "views_validation.v18",
[js_test:views_validation] 	"vReceived" : Timestamp(0, 0),
[js_test:views_validation] 	"vReceivedEpoch" : ObjectId("00000000ffffffffffffffff"),
[js_test:views_validation] 	"vReceivedTimestamp" : Timestamp(4294967295, 4294967295),
[js_test:views_validation] 	"shardId" : "shard-rs0",
[js_test:views_validation] 	"$clusterTime" : {
[js_test:views_validation] 		"clusterTime" : Timestamp(1647622589, 150),
[js_test:views_validation] 		"signature" : {
[js_test:views_validation] 			"hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
[js_test:views_validation] 			"keyId" : NumberLong(0)
[js_test:views_validation] 		}
[js_test:views_validation] 	},
[js_test:views_validation] 	"operationTime" : Timestamp(1647622589, 150)
[js_test:views_validation] }
[js_test:views_validation] _getErrorWithCode@src/mongo/shell/utils.js:24:13
[js_test:views_validation] doassert@src/mongo/shell/assert.js:18:14
[js_test:views_validation] _assertCommandFailed@src/mongo/shell/assert.js:805:29
[js_test:views_validation] assert.commandFailedWithCode@src/mongo/shell/assert.js:851:16
[js_test:views_validation] @jstests/core/views/views_validation.js:140:8
[js_test:views_validation] @jstests/core/views/views_validation.js:191:2

views_collation.js failures

failed test jobs:

  • causally_consistent_jscore_passthrough_auth
  • causally_consistent_jscore_passthrough

error messages:

[js_test:views_collation] assert: command failed: {
[js_test:views_collation] 	"ok" : 0,
[js_test:views_collation] 	"errmsg" : "command failed because of stale config :: caused by :: sharding status of collection views_collation.simpleView is not currently available for description and needs to be recovered from the config server",
[js_test:views_collation] 	"code" : 13388,
[js_test:views_collation] 	"codeName" : "StaleConfig",
[js_test:views_collation] 	"ns" : "views_collation.simpleView",
[js_test:views_collation] 	"vReceived" : Timestamp(0, 0),
[js_test:views_collation] 	"vReceivedEpoch" : ObjectId("00000000ffffffffffffffff"),
[js_test:views_collation] 	"vReceivedTimestamp" : Timestamp(4294967295, 4294967295),
[js_test:views_collation] 	"shardId" : "shard-rs0",
[js_test:views_collation] 	"$clusterTime" : {
[js_test:views_collation] 		"clusterTime" : Timestamp(1647622170, 9),
[js_test:views_collation] 		"signature" : {
[js_test:views_collation] 			"hash" : BinData(0,"aHi64xqfbHHUL0/BCwM/Shp+asE="),
[js_test:views_collation] 			"keyId" : NumberLong("7076483078616514576")
[js_test:views_collation] 		}
[js_test:views_collation] 	},
[js_test:views_collation] 	"operationTime" : Timestamp(1647622170, 9)
[js_test:views_collation] } with original command request: {
[js_test:views_collation] 	"query" : {
[js_test:views_collation] 		"aggregate" : "simpleCollection",
[js_test:views_collation] 		"pipeline" : [
[js_test:views_collation] 			{
[js_test:views_collation] 				"$lookup" : {
[js_test:views_collation] 					"from" : "simpleView",
[js_test:views_collation] 					"localField" : "x",
[js_test:views_collation] 					"foreignField" : "x",
[js_test:views_collation] 					"as" : "result"
[js_test:views_collation] 				}
[js_test:views_collation] 			}
[js_test:views_collation] 		],
[js_test:views_collation] 		"cursor" : {
[js_test:views_collation] 
[js_test:views_collation] 		},
[js_test:views_collation] 		"lsid" : {
[js_test:views_collation] 			"id" : UUID("3901dcc0-dc60-4d54-a194-92855538ad35")
[js_test:views_collation] 		},
[js_test:views_collation] 		"$clusterTime" : {
[js_test:views_collation] 			"clusterTime" : Timestamp(1647622170, 9),
[js_test:views_collation] 			"signature" : {
[js_test:views_collation] 				"hash" : BinData(0,"aHi64xqfbHHUL0/BCwM/Shp+asE="),
[js_test:views_collation] 				"keyId" : NumberLong("7076483078616514576")
[js_test:views_collation] 			}
[js_test:views_collation] 		},
[js_test:views_collation] 		"readConcern" : {
[js_test:views_collation] 			"afterClusterTime" : Timestamp(1647622170, 9)
[js_test:views_collation] 	},
[js_test:views_collation] 	"$readPreference" : {
[js_test:views_collation] 		"mode" : "secondary"
[js_test:views_collation] 	}
[js_test:views_collation] } on connection: connection to localhost:20253
[js_test:views_collation] _getErrorWithCode@src/mongo/shell/utils.js:24:13
[js_test:views_collation] doassert@src/mongo/shell/assert.js:18:14
[js_test:views_collation] _assertCommandWorked@src/mongo/shell/assert.js:737:25
[js_test:views_collation] assert.commandWorked@src/mongo/shell/assert.js:829:16
[js_test:views_collation] @jstests/core/views/views_collation.js:187:8
[js_test:views_collation] @jstests/core/views/views_collation.js:522:2

jstests/core/timeseries/timeseries_lookup.js test failures

failed test jobs:

  • causally_consistent_jscore_passthrough
  • sharded_causally_consistent_jscore_passthrough
  • sharded_causally_consistent_read_concern_snapshot_passthrough

error messages:

[js_test:timeseries_lookup] assert: command failed: {
[js_test:timeseries_lookup] 	"ok" : 0,
[js_test:timeseries_lookup] 	"errmsg" : "command failed because of stale config :: caused by :: sharding status of collection timeseries_lookup.b is not currently available for description and needs to be recovered from the config server",
[js_test:timeseries_lookup] 	"code" : 13388,
[js_test:timeseries_lookup] 	"codeName" : "StaleConfig",
[js_test:timeseries_lookup] 	"ns" : "timeseries_lookup.b",
[js_test:timeseries_lookup] 	"vReceived" : Timestamp(0, 0),
[js_test:timeseries_lookup] 	"vReceivedEpoch" : ObjectId("00000000ffffffffffffffff"),
[js_test:timeseries_lookup] 	"vReceivedTimestamp" : Timestamp(4294967295, 4294967295),
[js_test:timeseries_lookup] 	"shardId" : "shard-rs0",
[js_test:timeseries_lookup] 	"$clusterTime" : {
[js_test:timeseries_lookup] 		"clusterTime" : Timestamp(1647621927, 48),
[js_test:timeseries_lookup] 		"signature" : {
[js_test:timeseries_lookup] 			"hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
[js_test:timeseries_lookup] 			"keyId" : NumberLong(0)
[js_test:timeseries_lookup] 		}
[js_test:timeseries_lookup] 	},
[js_test:timeseries_lookup] 	"operationTime" : Timestamp(1647621927, 48)
[js_test:timeseries_lookup] } with original command request: {
[js_test:timeseries_lookup] 	"query" : {
[js_test:timeseries_lookup] 		"aggregate" : "a",
[js_test:timeseries_lookup] 		"pipeline" : [
[js_test:timeseries_lookup] 			{
[js_test:timeseries_lookup] 				"$lookup" : {
[js_test:timeseries_lookup] 					"from" : "b",
[js_test:timeseries_lookup] 					"localField" : "tags.hostname",
[js_test:timeseries_lookup] 					"foreignField" : "tags.hostname",
[js_test:timeseries_lookup] 					"as" : "matchedB"
[js_test:timeseries_lookup] 				}
[js_test:timeseries_lookup] 			},
[js_test:timeseries_lookup] 			{
[js_test:timeseries_lookup] 				"$project" : {
[js_test:timeseries_lookup] 					"_id" : 0,
[js_test:timeseries_lookup] 					"host" : "$tags.hostname",
[js_test:timeseries_lookup] 					"matchedB" : {
[js_test:timeseries_lookup] 						"$size" : "$matchedB"
[js_test:timeseries_lookup] 					}
[js_test:timeseries_lookup] 				}
[js_test:timeseries_lookup] 			},
[js_test:timeseries_lookup] 			{
[js_test:timeseries_lookup] 				"$sort" : {
[js_test:timeseries_lookup] 					"host" : 1
[js_test:timeseries_lookup] 				}

jstests/concurrency/fsm_workloads/view_catalog_cycle_lookup.js test failure

test suite name:

  • concurrency_sharded_causal_consistency

error messages:

[fsm_workload_test:view_catalog_cycle_lookup]         Error: assert failed : {
[fsm_workload_test:view_catalog_cycle_lookup]         	"ok" : 0,
[fsm_workload_test:view_catalog_cycle_lookup]         	"errmsg" : "sharding status of collection test31_fsmdb0.view_catalog_cycle_lookup_viewE is not currently available for description and needs to be recovered from the config server",
[fsm_workload_test:view_catalog_cycle_lookup]         	"code" : 13388,
[fsm_workload_test:view_catalog_cycle_lookup]         	"codeName" : "StaleConfig",
[fsm_workload_test:view_catalog_cycle_lookup]         	"ns" : "test31_fsmdb0.view_catalog_cycle_lookup_viewE",
[fsm_workload_test:view_catalog_cycle_lookup]         	"vReceived" : Timestamp(0, 0),
[fsm_workload_test:view_catalog_cycle_lookup]         	"vReceivedEpoch" : ObjectId("00000000ffffffffffffffff"),
[fsm_workload_test:view_catalog_cycle_lookup]         	"vReceivedTimestamp" : Timestamp(4294967295, 4294967295),
[fsm_workload_test:view_catalog_cycle_lookup]         	"shardId" : "shard-rs0",
[fsm_workload_test:view_catalog_cycle_lookup]         	"$clusterTime" : {
[fsm_workload_test:view_catalog_cycle_lookup]         		"clusterTime" : Timestamp(1647748069, 99),
[fsm_workload_test:view_catalog_cycle_lookup]         		"signature" : {
[fsm_workload_test:view_catalog_cycle_lookup]         			"hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
[fsm_workload_test:view_catalog_cycle_lookup]         			"keyId" : NumberLong(0)
[fsm_workload_test:view_catalog_cycle_lookup]         		}
[fsm_workload_test:view_catalog_cycle_lookup]         	},
[fsm_workload_test:view_catalog_cycle_lookup]         	"operationTime" : Timestamp(1647748069, 94)
[fsm_workload_test:view_catalog_cycle_lookup]         }
[fsm_workload_test:view_catalog_cycle_lookup] 
[fsm_workload_test:view_catalog_cycle_lookup]         quietlyDoAssert@jstests/concurrency/fsm_libs/assert.js:55:18
[fsm_workload_test:view_catalog_cycle_lookup]         assert@src/mongo/shell/assert.js:151:17
[fsm_workload_test:view_catalog_cycle_lookup]         wrapAssertFn@jstests/concurrency/fsm_libs/assert.js:65:16
[fsm_workload_test:view_catalog_cycle_lookup]         assertWithLevel@jstests/concurrency/fsm_libs/assert.js:89:21
[fsm_workload_test:view_catalog_cycle_lookup]         readFromView@jstests/concurrency/fsm_workloads/view_catalog_cycle_lookup.js:145:25



 Comments   
Comment by Githook User [ 07/Apr/22 ]

Author:

{'name': 'Mihai Andrei', 'email': 'mihai.andrei@10gen.com', 'username': 'mtandrei'}

Message: SERVER-64665 Early return in AutoGet constructors once we detect that a secondary namespace is a view or is sharded
Branch: master
https://github.com/mongodb/mongo/commit/ce04644c9b7376da93fa588c963c90406390e538

Comment by Mihai Andrei [ 31/Mar/22 ]

I should clarify: shard versioning is meaningful in a multicollection context as each collection has a shard version associated with it, however, the problem is that we can't resolve each namespace's shard version in the allowed number of retries. In the context of views_validation.js, this is problematic in so far as we fail with a StaleConfig error (because we are trying to resolve all of the shard versions up front) instead of a 'ViewDepthLimitExceeded' (in the classic case, we resolve the primary namespace, then consult the view catalog). If we have a series of 10 $lookup  stages against regular (that is, non cyclic) views, both classic and SBE lookup will fail with a StaleConfig exception. So, for this specific test, it's more of a question of which error comes first.

Comment by Kyle Suarez [ 31/Mar/22 ]

mihai.andrei or someone else, could you please elaborate on

Unfortunately, for each secondary collection, this results in a StaleShardVersion exception, and we run of retries (10) before we can determine whether all secondary namespaces are unsharded:

Is it because, in general, a shard version is not particularly meaningful in a multi-namespace context and therefore the exception is doomed to always throw in the current system?

Comment by Eric Cox (Inactive) [ 30/Mar/22 ]

yoonsoo.kim Mihai has more bandwidth and has investigated the problem a bit more after talking with dianna.hohensee. We agreed to bail out early in AutoGetMulti* if we get a hit for a sharded secondary namespace, likewise a view to avoid using the StaleConfig retry budget. Going to reassign to mihai.

Generated at Thu Feb 08 06:00:52 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.