[SERVER-53385] Find a pattern to assert Mutex 'is locked' invariant in user code Created: 16/Dec/20  Updated: 27/Oct/23  Resolved: 17/Dec/20

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Andrew Shuvalov (Inactive) Assignee: Billy Donahue
Resolution: Works as Designed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
is related to SERVER-19549 Apply clang thread safety static anno... Backlog
Participants:

 Description   

The naive implementation is to add methods to the Mutex code directly:

inline void Mutex::invariantIsLocked() const {    
      invariant(_isLocked,     str::stream() << "Mutex " << getName() << " must be locked");
}
inline void Mutex::invariantIsNotLocked() const {    
      invariant(!_isLocked, str::stream() << "Mutex " << getName() << " must be unlocked");
}

with sample usage pattern to have invariant in a callback that could be passed to the condition_variable:

auto callback = [this] () -> bool {
   _mutex.invariantIsLocked();
   return _variable > 0;
}
 
stdx::unique_lock<Latch> ul(_mutex);
_conditionVariable.wait(_mutex, callback);

However it might be unclean to add extra methods at the Mutex layer. billy.donahue said:

"I would like to get a separate ticket just for this enhancement, and discuss with Service Arch.

I would like to know more about the WIP use case for this. I'm uneasy with adding a member functions to mongo::Mutex that won't be supported by its base class, and cannot be supported by other mutex implementations like std::mutex. I think the only public API in mongo::Mutex that is non-standard is getName, which really just fills out the mongo::Latch polymorphic interface and is purely for observability in infrastructure code, not really used at user call sites. That is to say, we could change over to std::mutex and only have to change the callers of getName(), which are few and really just internal to stats and diagnostics systems that mutex clients don't normally interact with.

If anything, I imagine these as nonmember generic functions that TAKE a Lockable or latch or mutex or whatever, and if that Lockable participates in having its locked state observed with isLocked() or something, we can observe it. If a caller wants to invariant on the output of that function, that's their business and not the concern of the platform/mutex.h header. They can just write their own invariants. They'd be better off that way anyway, because they'll have a relevant SourceLocation in the failure, instead of one that just points into the Mutex class."

 



 Comments   
Comment by Billy Donahue [ 17/Dec/20 ]

Closing as another viable approach was discussed.

Comment by Billy Donahue [ 17/Dec/20 ]

That would be either SERVER-19549 or a duplicate of it.

Comment by Andrew Shuvalov (Inactive) [ 17/Dec/20 ]

Billy, this trick looks fine to me, feel free to resolve this ticket. Also CC matthew.saltz on the initial code review.

Separately, I believe we have an open ticket somewhere for lock annotations like the one I mentioned above:

int _data GUARDEDBY(_mutex);

so anyone plans to work on this I rather see it happening.

Comment by Billy Donahue [ 17/Dec/20 ]

Andrew, I see what you're getting at.

A condition_variable::wait predicate will always have a mutex locked during its execution.
A condition_variable could of course be locking the wrong one, but hopefully these predicates are very closely coupled with the data they're protecting and locally defined, and so the invariant wouldn't help those cases too much. It would be obvious that the mutex is held.

But to me this peekLockState business feels pretty heavy to support something that's not accidentally screwed up that much, and I would be a little disappointed to see it in otherwise elegant code just as matter of routine.

Are there cases where we lock one mutex, and then have a condition wait on a predicate that needs a different mutex?
There is another solution here maybe. A purely additive external library or local wrapper solution. The condition variable can PASS the identity of the mutex it's holding TO the predicate, so there's no doubt about whether it's held.

auto callback = [this] (const auto* locked) {
   invariant(&locked == &_mutex, "You're waiting with the wrong Mutex, dummy!");
   return _variable > 0;
}
 
stdx::unique_lock<Latch> ul(_mutex);
_conditionVariable.wait(_mutex, [&]{ return callback(&_mutex); });

But I think you should be passing ul rather than _mutex to the wait function.

auto callback = [this] (const auto& ul) {
   invariant(ul.mutex() == &_mutex, "You're waiting with the wrong Mutex, dummy!");
   return _variable > 0;
};
stdx::unique_lock<Latch> ul(_mutex);
_conditionVariable.wait(ul, [&]{ return callback(ul); });

Of course the condition is often given ul instead of _mutex, which I find a little safer. The callback would then have access to the std::unique_lock::owns_lock() and std::unique_lock::mutex() accessors and can invariant(&ul.mutex() == &_mutex); and or even invariant(ul.owns_lock()) inside the predicate more directly.

I propose that THIS idiom could be wrapped up in a little class or idiomatic call sequence so it's not so manual.

For your example:

 
bool A::isDataPositiveLocked(const stdx::unique_lock<Mutex>& ul) {
    invariant(ul.owns_lock(), "This thing is uniquely not locking anything");
    invariant(ul.mutex() == &_mutex, "wrong mutex");
    return _data > 0;
}
 
bool A::isDataPositive() {
    stdx::unique_lock ul(_mutex);
    return isDataPositiveLocked(ul);
}

I might ask that isDataPositiveLocked should accept a const unique_lock& as a proof of work if it's really worried about this being confused. Then it can easily invariant on the identity of the unique_lock's mutex() identity and owns_lock() status.

Comment by Andrew Shuvalov (Inactive) [ 17/Dec/20 ]
  1. I like the interface above.
  2. After your explanation I got that the invariant 'not locked' is wrong, because another thread can grab the lock at arbitrary time. The use case for 'is locked' is in ticket description, for the condition_variable callback. The simplest use case is:

bool A::isDataPositiveLocked() {
    invariant(lock_debugging::peekLockState(_mutex) != kUnlocked);
    return _data > 0;
}
 
bool A::isDataPositive() {
    stdx::unique_lock<Latch> ul(_mutex);
    return isDataPositiveLocked();
}

If we had proper variable lock annotation that would be a better solution, however not full replacement for the condition_variable case. I mean:

private:
    int _data GUARDED_BY(_mutex);

Comment by Billy Donahue [ 16/Dec/20 ]

INVARIANT_LOCKED(_mutex) would be an unnecessary macro.

I think invariant(lock_debugging::peekLockState(_mutex) != kUnlocked) would do the same thing in pure C++ without creating a new macro (I just made up the name and the namespace for purposes of discussion). peekLockState can be a function template usable with different lockables. I'm thinking it returns something like a tri-bool so it can express an indeterminate result. If those lockables have observable lock state, it would perhaps return kLocked or kUnlocked, but it can also return (constexpr) kUnknown if the argument's lock state is not observable at all.

But something about this is still puzzling me. Why doesn't the caller know what's up with the Mutex? If such an invariant passes, we still won't know we're safe. The mutex is perhaps available to several threads that might lock it at any time. If it's not available to several threads, why do we have it? So I wonder about the use case.

Comment by Andrew Shuvalov (Inactive) [ 16/Dec/20 ]

Maybe it's cleaner to have something like INVARIANT_LOCKED(_mutex), which will wrap some overloading code, which will either call an invariant or does nothing depending on the _mutex implementation? The major drawback here is that if it's no-op for unsupported mutex variant, it will be misleading to make impression that the check is made and the mutex is guaranteed to be locked.

Generated at Thu Feb 08 05:30:46 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.