|
Issue Links:
|
Depends
|
Documented
|
|
is documented by
|
DOCS-15060
|
Investigate changes in SERVER-56756: ...
|
Closed
|
Related
|
|
related to
|
SERVER-65766
|
ShardingStateRecovery makes remote ca...
|
Closed
|
|
related to
|
SERVER-65825
|
Increase fassertOnLockTimeoutForStepU...
|
Closed
|
|
related to
|
SERVER-61251
|
Ensure long running storage engine op...
|
Backlog
|
|
is related to
|
SERVER-71520
|
Dump all thread stacks on RSTL acquis...
|
Closed
|
|
| Backwards Compatibility: |
Fully Compatible
|
| Operating System: |
Linux
|
| Steps To Reproduce: |
- Start a 3-node replica-set.
- Start a single node replica-set for config server.
- Start a mongos server.
- Start a workload with multiple client threads (e.g., 50) running a mixture of find/update operations against the primary (through mongos).
- Freeze the dbpath for the primary (e.g., fsfreeze --freeze /mnt/primary).
- Ask the primary to step down.
|
| Sprint: |
Repl 2021-07-12, Repl 2021-07-26, Repl 2021-08-09, Repl 2021-08-23, Replication 2021-11-15, Replication 2021-11-29, Replication 2021-12-13, Replication 2021-12-27, Replication 2022-01-10, Replication 2022-01-24, Replication 2022-02-07 |
| Participants: |
|
|
Sending a step down request to a primary that is experiencing disk failures could result in consistent time-out errors:
{
|
"operationTime" : Timestamp(1620337238, 857),
|
"ok" : 0,
|
"errmsg" : "Could not acquire the global shared lock before the deadline for stepdown",
|
"code" : 262,
|
"codeName" : "ExceededTimeLimit",
|
"$gleStats" : {
|
"lastOpTime" : Timestamp(0, 0),
|
"electionId" : ObjectId("7fffffff0000000000000001")
|
},
|
"lastCommittedOpTime" : Timestamp(1620337238, 327),
|
"$configServerState" : {
|
"opTime" : {
|
"ts" : Timestamp(1620337306, 1),
|
"t" : NumberLong(1)
|
}
|
},
|
"$clusterTime" : {
|
"clusterTime" : Timestamp(1620337306, 1),
|
"signature" : {
|
"hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
|
"keyId" : NumberLong(0)
|
}
|
}
|
}
|
The error is returned from here and the behavior is easy to reproduce. I've tested the behavior on v4.0.23.
Also, I tried to attach GDB to the primary to collect stack-traces, but GDB hangs and I haven't been able to find an alternative yet.
|
|
Summary of the fix (more detailed in the PR):
We added a parameter `fassertOnLockTimeoutForStepUpDown` which defaults to 15s (and can be turned off by setting it to 0), which will allow a server, which has received a request to Step Down (or Step Up), to abort in case it is unable to acquire a lock (the RSTL lock more precisely) for that time duration. This would enable a cluster with a primary node experiencing a disk failure (or other similar rare issues) to elect a new primary and thus continue to be available.
|
|
Author:
{'name': 'Adi Zaimi', 'email': 'adizaimi@yahoo.com', 'username': 'adizaimi'}
Message: SERVER-56756 fassert in stepup/down when RSTL timeout is above threshold
Branch: master
https://github.com/mongodb/mongo/commit/757f44e1a1cc496f8030e00249aaf43163a85677
|
|
Continuing discussion here instead of on the github ticket.
Lingzhi suggests that {shutdown:1, force=true} is a workaround to the issue here, but unfortunately it does not seem to be the case. Sending shutdown causes RSTL to timeout:
{"t":\{"$date":"2022-01-05T17:55:32.989+00:00"} ,"s":"W", "c":"COMMAND", "id":4719000, "ctx":"conn104","msg":"Error stepping down during force shutdown","attr":{"error":{"code":24,"codeName":"LockTimeout","errmsg":"Unable to acquire X lock on '{4611686018427387905: ReplicationStateTransition, 1}' within 15000ms. opId: 3113, op: conn104, connId: 104."}}}
|
{"t":\{"$date":"2022-01-05T17:55:32.989+00:00"},"s":"I", "c":"COMMAND", "id":4695400, "ctx":"conn104","msg":"Terminating via shutdown command","attr":{"force":true,"timeoutSecs":15}}
|
...
however this node never stepsdown and keeps sending heartbeads, so the other node does not step up until the FS is unfrozen:
===- Unfreeze the mount point @ Wed Jan 5 17:56:52 UTC 2022
{"t":\{"$date":"2022-01-05T17:57:03.201+00:00"}
,"s":"I", "c":"REPL", "id":6015309, "ctx":"OplogApplier-0","msg":"Logging transition to primary to oplog on stepup"}
|
|
Discussed with Lingzhi and creating a patch to abort() server when failing to acquire RSTL on stepDown. Opening PR to discuss this behavior further if needed.
|
|
I'm not sure what is the codepath of that interruption you point out, but in this case (fsfreeze) the process (well, the writer threads really) is in uninterruptible sleep – kill -9 doesn't even do anything until the fs is unfrozen.
|
|
In this case, the write operation that's blocking stepDown should have been interrupted, right? I wonder if SERVER-61251 would help here too.
|
|
I put an abort at the moment we timeout getting the lock, and got a core:
Thread 1 (Thread 0x7fbabef14700 (LWP 8005)):
|
#1 0x00007f62f9a6b921 in abort () from /lib/x86_64-linux-gnu/libc.so.6
|
#2 0x00007f62f4f41bbf in mongo::LockerImpl::_lockComplete (this=0x5614b48ea680, opCtx=0x2710, resId=..., mode=<optimized out>, deadline=...) at src/mongo/db/concurrency/lock_state.cpp:1038
|
#3 0x00007f62f4f4a0dc in mongo::repl::ReplicationStateTransitionLockGuard::ReplicationStateTransitionLockGuard (this=0x7f62ca1cfae8, opCtx=0x7f62ca1cf130, mode=mongo::MODE_NONE) at src/mongo/db/concurrency/replication_state_transition_lock_guard.cpp:49
|
#4 0x00007f62f801daf0 in mongo::repl::ReplicationCoordinatorImpl::AutoGetRstlForStepUpStepDown::AutoGetRstlForStepUpStepDown (this=0x7f62ca1cfad0, repl=<optimized out>, opCtx=<optimized out>, stateTransition=<optimized out>, deadline=...) at src/mongo/db/repl/replication_coordinator_impl.cpp:2512
|
#5 0x00007f62f801e966 in mongo::repl::ReplicationCoordinatorImpl::stepDown (this=0x5614af75e000, opCtx=0x5614b7c20800, force=true, waitTime=..., stepdownTime=...) at src/mongo/db/repl/replication_coordinator_impl.cpp:2650
|
#6 0x00007f62f3457981 in mongo::repl::CmdReplSetStepDown::run (this=0x7f62f34665e0 <mongo::repl::cmdReplSetStepDown>, opCtx=0x5614b7c20800, cmdObj=..., result=...) at src/mongo/db/repl/repl_set_commands.cpp:585
|
#7 0x00007f62f86cfc65 in mongo::BasicCommand::runWithReplyBuilder (this=0x7f62f34665e0 <mongo::repl::cmdReplSetStepDown>, opCtx=0x7f62ca1cf130, db=<error reading variable: Cannot access memory at address 0x8>, cmdObj=<error reading variable: Cannot access memory at address 0x108248c8b48>..., replyBuilder=<optimized out>) at src/mongo/db/commands.h:953
|
#8 0x00007f62f50578d5 in mongo::BasicCommandWithReplyBuilderInterface::Invocation::run (this=<optimized out>, opCtx=0x5614b7c20800, result=0x5614b567c090) at src/mongo/db/commands.cpp:879
|
#9 0x00007f62f5047130 in mongo::CommandHelpers::runCommandInvocation (opCtx=0x5614b7c20800, request=..., invocation=0x5614b4eeb080, response=0x5614b567c090) at src/mongo/db/commands.cpp:199
|
The reason why attaching gdb may not work is because the process is in a strange state when one ore more of its threads are in uninterruptible sleep (which is what happens when we freeze the filesystem underneath).
|
Generated at Thu Feb 08 05:40:07 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.