[SERVER-59673] Investigate better solutions for fixing the deadlock issue in profiling Created: 30/Aug/21 Updated: 29/Oct/23 Resolved: 02/Dec/22 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | 6.3.0-rc0 |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Wenbin Zhu | Assignee: | Jordi Olivares Provencio |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | techdebt | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||||||||||||||
| Sprint: | Execution Team 2022-05-02, Execution Team 2022-05-16, Execution Team 2022-10-03, Execution Team 2022-10-17, Execution Team 2022-10-31, Execution Team 2022-11-14, Execution Team 2022-12-12, Execution Team 2022-11-28 | ||||||||||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||||||||||
| Linked BF Score: | 145 | ||||||||||||||||||||||||||||||||
| Description |
|
|
| Comments |
| Comment by Githook User [ 02/Dec/22 ] |
|
Author: {'name': 'Jordi Olivares Provencio', 'email': 'jordi.olivares-provencio@mongodb.com', 'username': 'jordiolivares'}Message: |
| Comment by Jordi Olivares Provencio [ 21/Sep/22 ] |
|
One option that is quite simple to implement is to define a list of collections which we know are not concerned by the ReplicaSet state changes and to skip the RSTL lock for them. Inside AutoGetCollection we can easily check for membership. Initially we would include the system.profile collection but it could be easily expanded in the future. This option also serves as documentation for which collections are used in replication. |
| Comment by Samyukta Lanka [ 20/Sep/21 ] |
|
Even if we can't remove UninterruptibleLockGuard in the near future, we might want to audit its current uses to see if a similar bug is possible with other operations. |
| Comment by Wenbin Zhu [ 20/Sep/21 ] |
|
Probably not part of this ticket, but maybe also consider filing another ticket to completely remove UninterruptibleLockGuard? |
| Comment by Andy Schwerin [ 14/Sep/21 ] |
|
We don't know ahead of time which operations we will record the profile On Tue, Sep 14, 2021 at 4:24 PM Louis Williams (Jira) <jira@mongodb.org> |
| Comment by Louis Williams [ 14/Sep/21 ] |
|
In this case, operations should know that they need to profile ahead of time, and they can acquire all the locks that they need up-front. There should be no consequence of holding a lock on the profile collection for an extended period of time, because there should be no conflicting operations on the profile collection, especially after |
| Comment by Andy Schwerin [ 14/Sep/21 ] |
|
I agree that pattern exists, but I don't think this quite matches the pattern, louis.williams. The cleanup tasks here actually require a completely separate set of locks from the "do something". The "cleanup task" is writing the profile result, which is intentionally considered separate work from the operation itself. |
| Comment by Gregory Noma [ 14/Sep/21 ] |
|
louis.williams just spitballing: maybe something like registering a callback when taking a lock which will run before the lock is released in those scenarios? |
| Comment by Louis Williams [ 14/Sep/21 ] |
|
This is just one example of a common, yet problematic pattern that we have throughout the server:
The solutions currently are: 1) block until the lock is available and ignoring interrupts (i.e. UninterruptibleLockGuard) or 2) skip the cleanup, which could leak resources. In the case of this ticket, that would mean not profile anything. We need this pattern more often:
|