As part of SERVER-47822 the sharding_multiversion suite should be enabled, however, several tests fail with a similar backtrace:
assert: command failed: { "ok" : 0, "errmsg" : "Data transfer error: migrate failed: StaleConfig{ ns: \"foo.bar\", vReceived: Timestamp(0, 0), vReceivedEpoch: ObjectId('000000000000000000000000'), vWanted: Timestamp(2, 0), vWantedEpoch: ObjectId('5ea989acce5433364888267d'), shardId: \"addshard5-rs1\" }: epoch mismatch detected for foo.bar", "code" : 96, "codeName" : "OperationFailed", "operationTime" : Timestamp(1588169133, 9), "$clusterTime" : { "clusterTime" : Timestamp(1588169133, 9), "signature" : { "hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="), "keyId" : NumberLong(0) } } } _getErrorWithCode@src/mongo/shell/utils.js:25:13 doassert@src/mongo/shell/assert.js:18:14 _assertCommandWorked@src/mongo/shell/assert.js:663:17 assert.commandWorked@src/mongo/shell/assert.js:755:16 @jstests/sharding/addshard5.js:28:1 @jstests/sharding/addshard5.js:3:2 uncaught exception: Error: command failed: { "ok" : 0, "errmsg" : "Data transfer error: migrate failed: StaleConfig{ ns: \"foo.bar\", vReceived: Timestamp(0, 0), vReceivedEpoch: ObjectId('000000000000000000000000'), vWanted: Timestamp(2, 0), vWantedEpoch: ObjectId('5ea989acce5433364888267d'), shardId: \"addshard5-rs1\" }: epoch mismatch detected for foo.bar", "code" : 96, "codeName" : "OperationFailed", "operationTime" : Timestamp(1588169133, 9), "$clusterTime" : { "clusterTime" : Timestamp(1588169133, 9), "signature" : { "hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="), "keyId" : NumberLong(0) } } } : _getErrorWithCode@src/mongo/shell/utils.js:25:13 doassert@src/mongo/shell/assert.js:18:14 _assertCommandWorked@src/mongo/shell/assert.js:663:17 assert.commandWorked@src/mongo/shell/assert.js:755:16 @jstests/sharding/addshard5.js:28:1 @jstests/sharding/addshard5.js:3:2 failed to load: jstests/sharding/addshard5.js
This is caused by an assert reached on the donor shard when trying to get the indexes during the migration of a chunk, the stacktrace can be seen below:
src/mongo/util/stacktrace_posix.cpp:501:44: mongo::printStackTrace() src/mongo/util/assert_util.cpp:81:24: mongo::DBException::traceIfNeeded(mongo::DBException const&) src/mongo/util/assert_util.h:134:22: _ZN5mongo11DBExceptionC4ERKNS_6StatusE src/mongo/util/assert_util.h:151:66: _ZN5mongo18AssertionExceptionC4ERKNS_6StatusE src/mongo/util/assert_util.h:182:71: mongo::error_details::ExceptionForImpl<(mongo::ErrorCodes::Error)13388, mongo::ExceptionForCat<(mongo::ErrorCategory)4>, mongo::ExceptionForCat<(mongo::ErrorCategory)5> >::ExceptionForImpl(mongo::Status const&) build/cached/mongo/base/error_codes.cpp:2419:63: mongo::error_details::throwExceptionForStatus(mongo::Status const&) src/mongo/util/assert_util.cpp:257:43: mongo::uassertedWithLocation(mongo::Status const&, char const*, unsigned int) src/mongo/db/s/collection_sharding_runtime.cpp:381:5: operator() src/mongo/db/s/collection_sharding_runtime.cpp:381:5: mongo::CollectionShardingRuntime::_getMetadataWithVersionCheckAt(mongo::OperationContext*, boost::optional<mongo::LogicalTime> const&) (.cold.951) src/mongo/db/s/collection_sharding_runtime.cpp:162:5: mongo::CollectionShardingRuntime::checkShardVersionOrThrow(mongo::OperationContext*) src/mongo/db/db_raii.cpp:353:34: mongo::AutoGetCollectionForReadCommand::AutoGetCollectionForReadCommand(mongo::OperationContext*, mongo::NamespaceStringOrUUID const&, mongo::AutoGetCollection::ViewMode, mongo::Date_t, mongo::AutoStatsTracker::LogMode) src/mongo/db/commands/list_indexes.cpp:152:94: mongo::(anonymous namespace)::CmdListIndexes::run(mongo::OperationContext*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, mongo::BSONObj const&, mongo::BSONObjBuilder&) src/mongo/db/commands.h:799:19: mongo::BasicCommand::runWithReplyBuilder(mongo::OperationContext*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, mongo::BSONObj const&, mongo::rpc::ReplyBuilderInterface*)
The logs on this failed task show the sequence of events:
A shard starts a moveChunk process:
d20771| 2020-04-29T14:05:32.785+00:00 I MIGRATE 22016 [MoveChunk] "Starting chunk migration donation","attr":{"requestParameters":"ns: foo.bar, [{ _id: MinKey }, { _id: MaxKey }), fromShard: addshard5-rs1, toShard: addshard5-rs0","collectionEpoch":{"$oid":"5ea989acce5433364888267d"}}
The recipient receives the command and start the migrateThread:
d20770| 2020-04-29T14:05:32.809+00:00 D4 TXN 23984 [migrateThread] "New transaction started","attr":{"txnNumber":0,"lsid":{"uuid":{"$uuid":"16fd9315-5698-4a4b-8275-781d40de4782"}}} d20770| 2020-04-29T14:05:32.809+00:00 I MIGRATE 22000 [migrateThread] "Starting receiving end of chunk migration","attr":{"chunkMin":{"_id":{"$minKey":1}},"chunkMax":{"_id":{"$maxKey":1}},"namespace":"foo.bar","fromShard":"addshard5-rs1","epoch":{"$oid":"5ea989acce5433364888267d"},"sessionId":"addshard5-rs1_addshard5-rs0_5ea989ac4beaa7b990cb1c02","migrationId":{"uuid":{"$uuid":"3c3be56a-5b3a-4de8-abf2-beac588042f3"}}} d20771| 2020-04-29T14:05:32.785+00:00 I MIGRATE 22016 [MoveChunk] "Starting chunk migration donation","attr":{"requestParameters":"ns: foo.bar, [{ _id: MinKey }, { _id: MaxKey }), fromShard: addshard5-rs1, toShard: addshard5-rs0","collectionEpoch":{"$oid":"5ea989acce5433364888267d"}}
As part of the process, it requests the index:
d20771| 2020-04-29T14:05:32.970+00:00 I COMMAND 51803 [conn35] "Slow query","attr":{"type":"command","ns":"foo.bar","command":{"listIndexes":"bar","$clusterTime":{"clusterTime":{"$timestamp":{"t":1588169132,"i":61}},"signature":{"hash":{"$binary":{"base64":"uIQUNHXCtJMnziE9Hq2R82DIcOY=","subType":"0"}},"keyId":6821134443802001430}},"$configServerState":{"opTime":{"ts":{"$timestamp":{"t":1588169132,"i":57}},"t":1}},"$db":"foo"},"numYields":0,"ok":0,"errMsg":"epoch mismatch detected for foo.bar","errName":13388,"errCode":13388,"reslen":550,"locks":{"ParallelBatchWriterMode":{"acquireCount":{"r":2}},"ReplicationStateTransition":{"acquireCount":{"w":3}},"Global":{"acquireCount":{"r":3}},"Database":{"acquireCount":{"r":3}},"Collection":{"acquireCount":{"r":3}},"Mutex":{"acquireCount":{"r":4}}},"protocol":"op_msg","durationMillis":161}
But, this operation fails with an exception:
d20770| 2020-04-29T14:05:32.971+00:00 W - 23075 [migrateThread] "DBException thrown","attr":{"error":{"code":13388,"codeName":"StaleConfig","errmsg":"epoch mismatch detected for foo.bar","ns":"foo.bar","vReceived":{"$timestamp":{"t":0,"i":0}},"vReceivedEpoch":{"$oid":"000000000000000000000000"},"vWanted":{"$timestamp":{"t":2,"i":0}},"vWantedEpoch":{"$oid":"5ea989acce5433364888267d"},"shardId":"addshard5-rs1"}}
The donor shard (the process on port 20771) has version 4.4.0-rc3-11-ge33cb0c:
d20771| 2020-04-29T14:05:22.089+00:00 I CONTROL 23403 [initandlisten] "Build Info","attr":{"buildInfo":{"version":"4.4.0-rc3-11-ge33cb0c","gitVersion":"e33cb0c81df6901645788f1df484c2d492c08b1f","openSSLVersion":"OpenSSL 1.0.1e-fips 11 Feb 2013","modules":["enterprise"],"allocator":"tcmalloc","environment":{"distmod":"rhel62","distarch":"x86_64","target_arch":"x86_64"}}}
The recipient (on port 20770), version 4.5.0-979-ga1b1a3d
d20770| 2020-04-29T14:05:22.085+00:00 I CONTROL 23403 [initandlisten] "Build Info","attr":{"buildInfo":{"version":"4.5.0-979-ga1b1a3d","gitVersion":"a1b1a3db6286e9337d80602a79c905ee41828e03","openSSLVersion":"OpenSSL 1.0.1e-fips 11 Feb 2013","modules":["enterprise"],"allocator":"tcmalloc","environment":{"distmod":"rhel62","distarch":"x86_64","target_arch":"x86_64"}}}
- is depended on by
-
SERVER-47907 Remove unnecessary shard from index_commands_shard_targeting.js
- Closed