[SERVER-25468] Using DBDirectClient trips invariant failure about lock state in ReplicationCoordinatorImpl::waitUntilOpTimeForRead() Created: 07/Aug/16 Updated: 11/Apr/17 Resolved: 13/Aug/16 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | 3.3.10 |
| Fix Version/s: | 3.3.12 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Max Hirschhorn | Assignee: | Kaloian Manassiev |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||
| Operating System: | ALL | ||||||||||||||||
| Sprint: | Sharding 2016-08-29 | ||||||||||||||||
| Participants: | |||||||||||||||||
| Linked BF Score: | 0 | ||||||||||||||||
| Description |
|
The invariant(!txn->lockState()->isLocked()) was introduced as part of |
| Comments |
| Comment by Githook User [ 13/Aug/16 ] |
|
Author: {u'username': u'kaloianm', u'name': u'Kaloian Manassiev', u'email': u'kaloian.manassiev@mongodb.com'}Message: |
| Comment by Kaloian Manassiev [ 08/Aug/16 ] |
|
This is an actual bug because waiting for read concern is attempted while holding the global X lock. This will deadlock, because while the lock is held the replication subsystem will not be able to advance the optime. We can make the waitForReadConcern method uassert if it is called with any lock. This ticket should be on the replication team, so I will move it out of platform. |
| Comment by Max Hirschhorn [ 07/Aug/16 ] |
|
acm, it's a bit tenuous, but I assigned this ticket to the platforms team based on the particular manifestation of the invariant failure. I also CC'd Kal because it appears this invariant is too stringent and perhaps he can comment on what his expectations were for callers from DBDirectClient. It isn't clear to me what the semantics of waiting for a read concern when using DBDirectClient should be. My understanding is that we wait once at the start of running a command, so it's kind of peculiar to end up waiting multiple times. Since you have more experience with things that are "internal client"-related, I figured that you might have some thoughts on this. |
| Comment by Andrew Morrow (Inactive) [ 07/Aug/16 ] |
|
Max, can you add a little context for why this belongs on the platforms backlog? It isn't immediately obvious to me. |