Details
-
Task
-
Resolution: Fixed
-
Critical - P2
-
None
-
None
Description
Description
- A sharded cluster.
- The config.system.sessions collection has only one chunk and is sharded. All other shards don't have the config.system.sessions collection
- The customer is trying to perform the Build Indexes on Replica Sets procedure according to our documentation by taking out a single node and building indices.
- When they built all indexes on the secondaries and performed the rs.stepDown() command on the primary, the secondaries crashed with the DBException::toString(): NamespaceNotFound: Failed to apply operation due to missing collection (6e54d489-42cd-4cec-9aed-06e030cdbe3f) error.
Analysis:
1. When a secondary was restarted as a standalone, the new config.system.sessions collection was created with new UUID since MongoDB Shell creates a new session implicitly.
2. Every secondary in the replica set ended up with the config.system.sessions collection with a different UUID.
3. When the primary stepped down, every replica set member crashed (only arbiter and the new primary survived with the Fatal assertion 16359 NamespaceNotFound error, making the shard read-only since the majority was lost.
Resolution: the manual must include the disableLogicalSessionCacheRefresh=true option in the Stop One Secondary and Restart as a Standalone step to prevent the config.system.sessions collection from being created by standalone mongod.
Alternatively, the customer can split the existing chunk for the config.system.sessions collection and shuffle it around across all available shards to pre-create the config.system.sessions collection with a correct UUID to prevent the future errors like this:
mongos> db.adminCommand({moveChunk: "config.system.sessions", find: {_id: MinKey},to:"shard03"})
|
{
|
"ok" : 0,
|
"errmsg" : "Data transfer error: Cannot receive chunk [{ _id: MinKey }, { _id: MaxKey }) for collection config.system.sessions because we already have an identically named collection with UUID 5127af6c-3a3f-4990-9669-ea307f795c92, which differs from the donor's UUID 25dfc80c-db79-4f94-81a9-c37c2b438fd6. Manually drop the collection on this shard if it contains data from a previous incarnation of config.system.sessions",
|
"code" : 96,
|
"codeName" : "OperationFailed",
|
"operationTime" : Timestamp(1561927900, 9),
|
"$clusterTime" : {
|
"clusterTime" : Timestamp(1561927900, 9),
|
"signature" : {
|
"hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
|
"keyId" : NumberLong(0)
|
}
|
}
|
}
|
Scope of changes
Impact to Other Docs
MVP (Work and Date)
Resources (Scope or Design Docs, Invision, etc.)
Attachments
Issue Links
- is related to
-
DOCS-8589 Comment on: "manual/reference/method/db.collection.getShardDistribution.txt"
-
- Closed
-
- related to
-
DOCS-12857 Audit and update starting repl as standalones
-
- Closed
-