-
Type:
Bug
-
Resolution: Unresolved
-
Priority:
Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
None
-
ALL
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Title:
Critical Outage in MongoDB 7.0.21 Sharded Cluster - Time Monotonicity Violation (Error Code 6493100)
MongoDB Version: 7.0.21
Deployment Type: Sharded Cluster (Config RS + Shards + Mongos)
Description
We experienced a complete outage in our MongoDB 7.0.21 sharded cluster environment. All cluster components went down simultaneously, including:
Shard replica sets
Config server replica set
Mongos routers
This resulted in full application downtime.
Error Observed
While reviewing the logs, we identified a Tripwire assertion related to a Time Monotonicity Violation, originating from the ReadThroughCache / ShardRegistry metadata refresh layer.
Error 1:
{"t":\{"$date":"2026-02-13T11:25:44.936+05:30"},"s":"E", "c":"ASSERT", "id":4457000, "ctx":"ShardRegistry-18921","msg":"Tripwire assertion","attr":{"error":{"code":6493100,"codeName":"Location6493100","errmsg":"Time monotonicity violation: lookup time
{ topologyTime: Timestamp(1750759199, 2), rsmIncrement: 40, forceReloadIncrement: 44506 } which is less than the earliest expected timeInStore { topologyTime: Timestamp(1750882566, 2), rsmIncrement: 40, forceReloadIncrement: 44506 }."},"location":"{fileName:\"src/mongo/util/read_through_cache.h\", line:549, functionName:\"operator()\"}"}}
Error 2:
,"s":"F", "c":"CONTROL", "id":6384300, "ctx":"ShardRegistry-0","msg":"Writing fatal message","attr":{"message":"DBException::toString(): Location6493100: Time monotonic ity violation: lookup time
{ topologyTime: Timestamp(1750759199, 2), rsmIncrement: 6, forceReloadIncrement: 5 }which is less than the earliest expected timeInStore { topologyTime: Timestamp(1750882566, 2), rsmIncrem ent: 6, forceReloadIncrement: 5 }.\nActual exception type: mongo::error_details::throwExceptionForStatus(mongo::Status const&)::NonspecificAssertionException\n\n"}}
Impact :
Full cluster outage
All mongos routers unavailable
Shard nodes became non-operational
Application downtime observed
Initial Findings
From our analysis:
The error is tied to topologyTime, which represents shard topology metadata stored in the config.shards collection.
The system detected a regression where the lookup metadata time was older than the cached/expected metadata time.
This triggered MongoDB's internal Tripwire safety assertion, resulting in process termination.
Recovery Attempts & Current Status
We restarted all cluster components:
Shard servers
Config servers
Mongos routers
However, even after restarting all servers, the cluster did not recover and the same issue persisted.
Config Metadata Validation
We verified shard topology metadata on the config primary.
Observation
Only one version of topologyTime is visible in the config.shards collection.
No historical or conflicting versions are present.
Output
csReplSet [direct: primary] config> db.shards.find()
[
)
},
)
},
)
}
]
This indicates that only the current topology metadata is present, and we do not see any older topologyTime values stored in the collection.
Additional Observation
Even after restarting all cluster components, the issue persisted. Additionally, validation of the config.shards collection shows only the current topologyTime values, with no evidence of older or conflicting versions stored in the metadata.
Given this, it appears the system may have encountered an internal defect or an unexpected edge condition related to topologyTime handling or ShardRegistry cache refresh logic.
Assistance Required
We request assistance in identifying the root cause of this incident and recovering the data.
Specifically, we would like to understand what internal conditions (such as replication behavior, elections, metadata refresh cycles, or cache synchronization mechanisms) could lead to this state.
As this issue resulted in a complete production outage, we request your urgent investigation, guidance, and support to identify the root cause and prevent recurrence.