[SERVER-53180] Log each operation that holds a lock for an extended period of time without yielding Created: 02/Dec/20  Updated: 18/Apr/23  Resolved: 18/Apr/23

Status: Closed
Project: Core Server
Component/s: Diagnostics, Logging
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Dmitry Agranat Assignee: Backlog - Storage Execution Team
Resolution: Won't Do Votes: 8
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File screenshot-1.png    
Issue Links:
Depends
Duplicate
duplicates SERVER-45496 Log operations that do not yield Closed
Related
related to SERVER-51806 bulk key insertion phase of index bui... Closed
related to SERVER-37479 report amount of time that a locker b... Closed
related to SERVER-47699 Change yield type used by range delet... Closed
Assigned Teams:
Storage Execution
Sprint: Execution Team 2021-11-01, Execution Team 2021-11-15, Execution Team 2021-11-29, Execution Team 2021-12-13, Execution Team 2021-12-27, Execution Team 2022-01-10, Execution Team 2022-01-24, Execution Team 2022-02-07, Execution Team 2022-02-21, Execution Team 2022-10-17, Execution Team 2022-10-31
Participants:
Case:

 Description   

Anything that holds a lock for an extended period of time without yielding is a potential performance problem. An extended period of time might be 1 second but should be configurable for debugging purposes or if this creates too much noise in the logs.

The unlock code should know how long the lock has been held and also know which connection this happens on so it will give us more information tying it back to the offending operation.



 Comments   
Comment by Connie Chen [ 18/Apr/23 ]

Project being closed as "won't do" subsequently closing these tickets as well

Comment by Connie Chen [ 18/Apr/23 ]

Closing this as won't do, we plan to remove locks rather than log them in the long run

Comment by Dan Larkin-York [ 13/Dec/22 ]

connie.chen@mongodb.com to open a project.

Comment by Gregory Noma [ 17/Oct/22 ]

In terms of the work described in this ticket as written, the naive approach is to simply look at the clock when a lock is acquired and again when is it released. However, uncontended locking and unlocking is a very hot path and this does add a bit of overhead to those operations.

Taking a step back to the motivation of this ticket, it seems that the ask here is to be able to identify an operation that prevented some other operation from acquiring a common resource. I'm wondering if we can do something more general to improve observability into the server when there is contention for resources.

For instance, if a thread knows what thread it is waiting on, the waiting thread can report what it is blocked on rather than the other way around. It is okay for the waiter to have some extra overhead, and this may help limit the overhead for the fast-path uncontended case.

Generated at Thu Feb 08 05:30:10 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.