[SERVER-34951] LockerImpl should invariant against active UninterruptibleLockGuard usage when _maxLockTimeout is set Created: 11/May/18  Updated: 06/Dec/22

Status: Backlog
Project: Core Server
Component/s: Storage
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Major - P3
Reporter: Dianna Hohensee (Inactive) Assignee: Backlog - Storage Execution Team
Resolution: Unresolved Votes: 0
Labels: newgrad
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
depends on SERVER-33575 Remove UninterruptibleLockGuards in q... Closed
depends on SERVER-45608 Remove UninterruptibleLockGuard from ... Closed
Related
related to SERVER-33244 Make all lock acquisitions for transa... Closed
Assigned Teams:
Storage Execution
Sprint: Storage NYC 2019-02-11, Execution Team 2020-01-13
Participants:

 Description   

This is follow up work for SERVER-33244, to add a max lock acquisition timeout override (_maxLockTimeout) for transactions in order to prevent transactions from deadlocking with one another.

UninterruptibleLockGuard cannot be used in transaction operation code paths if we wish to prevent deadlocks. However, query currently uses UninterruptibleLockGuard in the find/agg code paths. So this work is blocked on the completion of SERVER-33575, to remove UninterruptibleLockGuard usages from query code paths.



 Comments   
Comment by Daniel Ernst [ 14/Jan/20 ]

Based on the stacktraces, the invariant is being triggered in the above test cases because TransactionParticipant uses an UninterruptibleLockGuard in commitUnpreparedTransactions. This UninterruptibleLockGuard will have to be removed in order to add the invariant.

Making things a little more difficult is that these test cases are failing only intermittently. This may be in part because other test failures are masking them, but general indeterminacy is also playing a factor (refine_collection_shard_key_crud_ops.js sometimes passes even with the invariant).

Comment by Dianna Hohensee (Inactive) [ 02/Jan/20 ]

The locking code has changed significantly, but this is the code that had the relevant behavior when this ticket was filed and seems like a good starting point for investigation.

Comment by Eric Milkie [ 09/Dec/19 ]

We should enumerate what this work is still blocked on.

Comment by Xiangyu Yao (Inactive) [ 21/Feb/19 ]

Put it back to backlog as it is something we aspire to do but there are still cases which break the rule.

Comment by Dianna Hohensee (Inactive) [ 07/Feb/19 ]

milkie there has been no redistribution. We'd have to request it.

Comment by Eric Milkie [ 07/Feb/19 ]

Flagging this for rescheduling, as the parent ticket was closed with its remaining work distributed to other tickets.

Comment by Eric Milkie [ 14/Jan/19 ]

Bumping this to Storage 2020Q1 quick wins, as the parent ticket has now been scheduled in February.

Generated at Thu Feb 08 04:38:22 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.