-
Type:
Bug
-
Resolution: Fixed
-
Priority:
Major - P3
-
Affects Version/s: 4.0.23
-
Component/s: Replication
-
None
-
Fully Compatible
-
Linux
-
-
Repl 2021-07-12, Repl 2021-07-26, Repl 2021-08-09, Repl 2021-08-23, Replication 2021-11-15, Replication 2021-11-29, Replication 2021-12-13, Replication 2021-12-27, Replication 2022-01-10, Replication 2022-01-24, Replication 2022-02-07
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Sending a step down request to a primary that is experiencing disk failures could result in consistent time-out errors:
{
"operationTime" : Timestamp(1620337238, 857),
"ok" : 0,
"errmsg" : "Could not acquire the global shared lock before the deadline for stepdown",
"code" : 262,
"codeName" : "ExceededTimeLimit",
"$gleStats" : {
"lastOpTime" : Timestamp(0, 0),
"electionId" : ObjectId("7fffffff0000000000000001")
},
"lastCommittedOpTime" : Timestamp(1620337238, 327),
"$configServerState" : {
"opTime" : {
"ts" : Timestamp(1620337306, 1),
"t" : NumberLong(1)
}
},
"$clusterTime" : {
"clusterTime" : Timestamp(1620337306, 1),
"signature" : {
"hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
"keyId" : NumberLong(0)
}
}
}
The error is returned from here and the behavior is easy to reproduce. I've tested the behavior on v4.0.23.
Also, I tried to attach GDB to the primary to collect stack-traces, but GDB hangs and I haven't been able to find an alternative yet.
- is related to
-
SERVER-71520 Dump all thread stacks on RSTL acquisition timeout
-
- Closed
-
- related to
-
SERVER-65766 ShardingStateRecovery makes remote calls to config server while holding the RSTL
-
- Closed
-
-
SERVER-65825 Increase fassertOnLockTimeoutForStepUpDown default timeout to 30 seconds
-
- Closed
-
-
SERVER-61251 Ensure long running storage engine operations are interruptible
-
- Closed
-