[SERVER-66752] Determine if lock acquisition can happen inside of the writeConflictRetry in find_and_modify.cpp Created: 25/May/22 Updated: 27/Oct/23 Resolved: 02/Sep/22 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Gregory Wlodarek | Assignee: | Jordi Olivares Provencio |
| Resolution: | Works as Designed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Sprint: | Execution Team 2022-08-22, Execution Team 2022-09-05 | ||||||||
| Participants: | |||||||||
| Description |
|
Running
with this diff
Shows that this writeConflictRetry is run with locks acquired at some earlier point. This is problematic because if a WriteConflictException is thrown, the locks and ticket will be held in logWriteConflictAndBackoff, which sleeps.
|
| Comments |
| Comment by Jordi Olivares Provencio [ 02/Sep/22 ] | |
|
Gotcha, I'll change it to Works as Designed then. Thanks for the correction gregory.noma@mongodb.com! | |
| Comment by Gregory Noma [ 02/Sep/22 ] | |
|
Since the intent of this ticket is to ensure that we're not calling logWriteConflictAndBackoff while holding resources from this particular writeConflictRetry in find_and_modify.cpp, if we've determined that isn't the case then can we instead close this ticket as "Works as Designed"? Sorry to be pedantic, I just want to make sure it's clear whether the outcome here is "we're sleeping while holding resources but we can't really do anything about it here" or "we're not sleeping while holding resources here". And I think "Wont Fix" implies the former whereas "Works as Designed" implies the latter. | |
| Comment by Jordi Olivares Provencio [ 02/Sep/22 ] | |
|
I would imagine, the unlockPending counter only gets increased from 0 when restoring a WUOW or if we're in one while unlocking | |
| Comment by Gregory Noma [ 02/Sep/22 ] | |
|
Thanks jordi.olivares-provencio@mongodb.com, does that mean that in the cases you observed we won't end up calling logWriteConflictAndBackoff because there's a higher-level WUOW (and presumably a higher-level writeConflictRetry), turning this writeConflictRetry into just a simple invocation of the function? | |
| Comment by Jordi Olivares Provencio [ 02/Sep/22 ] | |
|
gregory.noma@mongodb.com Forgot to mention it doesn't result in failures with the reproducer. Updated the comment to reflect that. | |
| Comment by Gregory Noma [ 02/Sep/22 ] | |
jordi.olivares-provencio@mongodb.com I'm a little confused it seems like these two sentences are contradictory? Or is "invariant-ing if we're locked" referring to something else? | |
| Comment by Jordi Olivares Provencio [ 02/Sep/22 ] | |
|
Adding invariant(!opCtx->lockState()->isLocked()) at the following location doesn't yield any invariant failures in master with the given reproducer. Running a test by just invariant-ing if we're locked yielded a large amount of errors in our tests. Investigating jstests/replsets/tenant_migration_fetch_committed_transactions.js and dumping the locks at crash time yielded the following:
This means that they are using two phase locking and some locks will still be held at the time of calling find_and_modify. This is fundamental to how MongoDB operates. Barring some seriously deep architectural refactors I don't believe we can do anything here. Closing the ticket as a result. |