[SERVER-33575] Remove UninterruptibleLockGuards in query code to allow interruptible lock acquisition Created: 01/Mar/18  Updated: 06/Dec/22  Resolved: 07/Feb/19

Status: Closed
Project: Core Server
Component/s: Querying
Affects Version/s: 3.7.3
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Louis Williams Assignee: Backlog - Query Team (Inactive)
Resolution: Won't Fix Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
is depended on by SERVER-34951 LockerImpl should invariant against a... Backlog
Assigned Teams:
Query
Participants:

 Description   

Since SERVER-32638, global and database locks can be interrupted when an OperationContext is killed and throw a DBException with an Interrupted error code. This includes all AutoGet helpers.

The following places in the query code have temporary UninterruptibleLockGuard s to prevent crashing due to inadequate exception handling:

src/mongo/db/commands/mr.cpp:368
src/mongo/db/commands/mr.cpp:1017
src/mongo/db/commands/mr.cpp:1405
src/mongo/db/commands/mr.cpp:1739
src/mongo/db/pipeline/document_source_cursor.cpp:268
src/mongo/db/query/find.cpp:265
src/mongo/db/query/query_yield.cpp:86
src/mongo/db/ttl.cpp:137



 Comments   
Comment by Samyukta Lanka [ 07/Feb/19 ]

tess.avitabile, none of the remaining UninterruptibleLockGuards conflict with prepare because they don't take an X or S lock. Although there is work in SERVER-38478 to make sure that the use of restoreLockState in query_yield.cpp will only restore IX or IS locks.

Comment by Tess Avitabile (Inactive) [ 07/Feb/19 ]

samy.lanka, can you please check whether the remaining UninterruptibleLockGuards are problematic for prepare?

Comment by Craig Homa [ 07/Feb/19 ]

There is still some remaining work from the Query team here but if this comes up again they will address it in separate and more narrowly scoped tickets.

Comment by David Storch [ 06/Feb/19 ]

The situation has changed since this ticket was filed:

  • The UniterruptibleLockGuards in mr.cpp were removed by justin.seyster's work under SERVER-38480.
  • The UniterruptibleLockGuard in document_source_cursor.cpp was removed by my work in SERVER-37451.
  • There is still an UniterruptibleLockGuard in find.cpp on the getMore path. I suspect that this is still necessary because of how OP_GET_MORE on an awaitData cursor is implemented, although with some more work it could be removed.
  • The UniterruptibleLockGuard in query_yield.cpp may be difficult to remove safely. This requires more investigation.
  • The UniterruptibleLockGuard in ttl.cpp still exists, but should probably fall outside the domain of this ticket since it's outside of the query module.

I'm marking this ticket to be re-triaged by the query team.

Comment by Asya Kamsky [ 11/May/18 ]

Query will investigate.

Comment by Dianna Hohensee (Inactive) [ 11/May/18 ]

I've run into an issue with an UninterruptibleLockGuard here in document_source_cursor.cpp (ran into it via an aggregation). SERVER-33244 is adding a configurable max lock acquisition timeout override for transaction operations, so transactions cannot deadlock with one another. I had an invariant to ensure UninterruptibleLockGuard isn't used when a max lock acquisition timeout is active for transaction ops: they are contradictory. I've created SERVER-34951 to add that check back, and will submit my code without it for v4.0, which is safe from deadlocks because transactions only do CRUD ops that take non-conflicting IS and IX locks. In v4.2 we will have a problem if we want to add ops to transactions that take exclusive locks.

Generated at Thu Feb 08 04:33:52 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.