[SERVER-26305] Deadlock between replication stepdown and threads about to wait for read concern Created: 23/Sep/16 Updated: 25/Jan/17 Resolved: 13/Oct/16 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | 3.3.14 |
| Fix Version/s: | 3.4.0-rc1 |
| Type: | Bug | Priority: | Critical - P2 |
| Reporter: | Kaloian Manassiev | Assignee: | Andy Schwerin |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||
| Operating System: | ALL | ||||||||||||||||
| Sprint: | Sharding 2016-10-10, Sharding 2016-10-31 | ||||||||||||||||
| Participants: | |||||||||||||||||
| Linked BF Score: | 0 | ||||||||||||||||
| Description |
|
Interrupting operations at replication stepdown acquires first the Client lock and then the replication coordinator mutex like shown in this thread:
Threads, which are about to wait for read or write concern first hold the replication coordinator mutex and then acquire the Client lock, like shown in this thread:
This results in deadlock when replication stepdown tries to interrupt threads. The ordering didn't use to be a problem before this change, which made replication threads use the generic OperationContext interruptable waits. |
| Comments |
| Comment by Githook User [ 13/Oct/16 ] | ||||||
|
Author: {u'username': u'andy10gen', u'name': u'Andy Schwerin', u'email': u'schwerin@mongodb.com'}Message: While this change also improves the readability of _awaitReplication_inlock and | ||||||
| Comment by Githook User [ 13/Oct/16 ] | ||||||
|
Author: {u'username': u'andy10gen', u'name': u'Andy Schwerin', u'email': u'schwerin@mongodb.com'}Message: While this change also improves the readability of _awaitReplication_inlock and | ||||||
| Comment by Githook User [ 06/Oct/16 ] | ||||||
|
Author: {u'username': u'benety', u'name': u'Benety Goh', u'email': u'benety@mongodb.com'}Message: Revert " This reverts commit 360d6ae39305dbc2f41ca7e0bfd424081fd4f030. | ||||||
| Comment by Githook User [ 05/Oct/16 ] | ||||||
|
Author: {u'username': u'andy10gen', u'name': u'Andy Schwerin', u'email': u'schwerin@mongodb.com'}Message: While this change also improves the readability of _awaitReplication_inlock and | ||||||
| Comment by Andy Schwerin [ 26/Sep/16 ] | ||||||
|
They aren't yet interruptible. I deferred that work, but could revive it if needed. | ||||||
| Comment by Kaloian Manassiev [ 26/Sep/16 ] | ||||||
|
Turns out that this is not such an easy task because the killOperation call also needs to invoke the listeners, but its contract is to call it with the client mutex acquired. However, now that I am looking at ReplicationCoordinatorImpl::interrupt I am thinking that since wait for read/write concern are now interruptible, we no longer need it so we could just get rid of this method altogether. | ||||||
| Comment by Eric Milkie [ 26/Sep/16 ] | ||||||
|
Sounds good, if we have the list at construction time. | ||||||
| Comment by Kaloian Manassiev [ 26/Sep/16 ] | ||||||
|
How about we just initialize the list of killOp listeners at construction time? That way we do not need to expose a lever to disable registering new ones. | ||||||
| Comment by Eric Milkie [ 26/Sep/16 ] | ||||||
|
If that's true, I'd like to see us remove the locking of the mutex entirely from setKillAllOperations(). Strangely, killOperation() already does not lock the mutex, even though it iterates through the listener list as well. | ||||||
| Comment by Kaloian Manassiev [ 26/Sep/16 ] | ||||||
|
But this vector should not be changing during the runtime of the process. It should be instantiated once at service startup and not modified after that. Is there any scenario which requires killop listeners to change? | ||||||
| Comment by Eric Milkie [ 26/Sep/16 ] | ||||||
|
I don't see how you can notify the killoplisteners outside of the lock, as the killoplisteners is just a vector that is protected by the service context mutex. | ||||||
| Comment by Kaloian Manassiev [ 26/Sep/16 ] | ||||||
|
I was looking at the different ways to fix this and I think the most appropriate one is to ensure that the client lock is not being held while notifying the killOp listeners. The operation contexts would still be killed under the clients list lock. The sequence would be:
The contract for each KillOpListener would become such that it should be prepared for the operation id on which it is invoked might not exist by the time ::interrupt is called. |