[SERVER-25468] Using DBDirectClient trips invariant failure about lock state in ReplicationCoordinatorImpl::waitUntilOpTimeForRead() Created: 07/Aug/16  Updated: 11/Apr/17  Resolved: 13/Aug/16

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: 3.3.10
Fix Version/s: 3.3.12

Type: Bug Priority: Major - P3
Reporter: Max Hirschhorn Assignee: Kaloian Manassiev
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Related
related to SERVER-28746 eval can deadlock on mmap flush lock ... Closed
is related to SERVER-24858 Tighten assertions around waiting for... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Sprint: Sharding 2016-08-29
Participants:
Linked BF Score: 0

 Description   

The invariant(!txn->lockState()->isLocked()) was introduced as part of SERVER-24858. This issue does not affect the 3.2 branch.



 Comments   
Comment by Githook User [ 13/Aug/16 ]

Author:

{u'username': u'kaloianm', u'name': u'Kaloian Manassiev', u'email': u'kaloian.manassiev@mongodb.com'}

Message: SERVER-25468 Fail commands scheduled through 'eval' which wait for replication
Branch: master
https://github.com/mongodb/mongo/commit/f210923e76fa101ca417a085dd370219221cdda0

Comment by Kaloian Manassiev [ 08/Aug/16 ]

This is an actual bug because waiting for read concern is attempted while holding the global X lock. This will deadlock, because while the lock is held the replication subsystem will not be able to advance the optime.

We can make the waitForReadConcern method uassert if it is called with any lock. This ticket should be on the replication team, so I will move it out of platform.

Comment by Max Hirschhorn [ 07/Aug/16 ]

acm, it's a bit tenuous, but I assigned this ticket to the platforms team based on the particular manifestation of the invariant failure. I also CC'd Kal because it appears this invariant is too stringent and perhaps he can comment on what his expectations were for callers from DBDirectClient. It isn't clear to me what the semantics of waiting for a read concern when using DBDirectClient should be. My understanding is that we wait once at the start of running a command, so it's kind of peculiar to end up waiting multiple times. Since you have more experience with things that are "internal client"-related, I figured that you might have some thoughts on this.

Comment by Andrew Morrow (Inactive) [ 07/Aug/16 ]

Max, can you add a little context for why this belongs on the platforms backlog? It isn't immediately obvious to me.

Generated at Thu Feb 08 04:09:15 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.