[SERVER-53180] Log each operation that holds a lock for an extended period of time without yielding Created: 02/Dec/20 Updated: 18/Apr/23 Resolved: 18/Apr/23 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Diagnostics, Logging |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Dmitry Agranat | Assignee: | Backlog - Storage Execution Team |
| Resolution: | Won't Do | Votes: | 8 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||||||||||||||||||||||
| Issue Links: |
|
||||||||||||||||||||||||||||
| Assigned Teams: |
Storage Execution
|
||||||||||||||||||||||||||||
| Sprint: | Execution Team 2021-11-01, Execution Team 2021-11-15, Execution Team 2021-11-29, Execution Team 2021-12-13, Execution Team 2021-12-27, Execution Team 2022-01-10, Execution Team 2022-01-24, Execution Team 2022-02-07, Execution Team 2022-02-21, Execution Team 2022-10-17, Execution Team 2022-10-31 | ||||||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||||||
| Case: | (copied to CRM) | ||||||||||||||||||||||||||||
| Description |
|
Anything that holds a lock for an extended period of time without yielding is a potential performance problem. An extended period of time might be 1 second but should be configurable for debugging purposes or if this creates too much noise in the logs. The unlock code should know how long the lock has been held and also know which connection this happens on so it will give us more information tying it back to the offending operation. |
| Comments |
| Comment by Connie Chen [ 18/Apr/23 ] |
|
Project being closed as "won't do" subsequently closing these tickets as well |
| Comment by Connie Chen [ 18/Apr/23 ] |
|
Closing this as won't do, we plan to remove locks rather than log them in the long run |
| Comment by Dan Larkin-York [ 13/Dec/22 ] |
|
connie.chen@mongodb.com to open a project. |
| Comment by Gregory Noma [ 17/Oct/22 ] |
|
In terms of the work described in this ticket as written, the naive approach is to simply look at the clock when a lock is acquired and again when is it released. However, uncontended locking and unlocking is a very hot path and this does add a bit of overhead to those operations. Taking a step back to the motivation of this ticket, it seems that the ask here is to be able to identify an operation that prevented some other operation from acquiring a common resource. I'm wondering if we can do something more general to improve observability into the server when there is contention for resources. For instance, if a thread knows what thread it is waiting on, the waiting thread can report what it is blocked on rather than the other way around. It is okay for the waiter to have some extra overhead, and this may help limit the overhead for the fast-path uncontended case. |