[SERVER-39934] CurOp::completeAndLogOperation should not hang waiting for global lock Created: 01/Mar/19 Updated: 29/Oct/23 Resolved: 21/Mar/19 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Concurrency |
| Affects Version/s: | None |
| Fix Version/s: | 4.0.9, 4.1.10 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Matthew Russotto | Assignee: | Sulabh Mahajan |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||
| Operating System: | ALL | ||||||||||||||||
| Backport Requested: |
v4.0
|
||||||||||||||||
| Sprint: | Storage Engines 2019-03-11, Storage Engines 2019-03-25 | ||||||||||||||||
| Participants: | |||||||||||||||||
| Linked BF Score: | 32 | ||||||||||||||||
| Story Points: | 8 | ||||||||||||||||
| Description |
|
When logging a command (either slow or when forced logging is enabled) CurOp::completeAndLogOperation attempts to take a global lock to obtain storage statistics. If something else has a Global X lock (or an enqueued Global X lock), this lock acquisition will stall behind that operation. This introduces an undesirable dependency on the global lock for otherwise lock-free operations such as $currentOp. We should give this acquisition a very short deadline and elide the storage stats when it is not available. |
| Comments |
| Comment by Githook User [ 03/Apr/19 ] |
|
Author: {'email': 'sulabh.mahajan@mongodb.com', 'name': 'Sulabh Mahajan', 'username': 'sulabhM'}Message: (cherry picked from commit ba3894493a94ed3c18458f391ff181d57475f010)
(cherry picked from commit 86b6aca9fa1940e85bba87261d1494ef2c208a4a)
(cherry picked from commit 82161eec79fc74652dc07b1c83fe500dc4f95e79)
(cherry picked from commit 3b4c6a689a3fdaa923d427ae112ea599513ef8ce)
(cherry picked from commit 6a9a5855048df1f4796a4032276d01318c398691)
The find command will always have the storage stats because the (cherry picked from commit 6125b5fb078a316854f0299b96b7d16eacb944de)
(cherry picked from commit bacb6b67706a2c057fcd0f76a38f416b225aa69a)
(cherry picked from commit 371197e4bab715a83272a4472e118ee5c5cbbf7c)
(cherry picked from commit 23ca771f76f85638f23bca2a4a6ac196a81fdc21) |
| Comment by Sulabh Mahajan [ 25/Mar/19 ] |
|
bruce.lucas, this will impact slow operation logging and profiler. No impact to serverStatus. |
| Comment by Bruce Lucas (Inactive) [ 25/Mar/19 ] |
|
sulabh.mahajan, can you confirm that this impacts only currentOp and slow operation logging, and not serverStatus? I want to make sure we aren't doing anything that might create frequent schema changes in serverStatus, which would have a bad effect on ftdc compression efficiency. |
| Comment by Githook User [ 21/Mar/19 ] |
|
Author: {'email': 'sulabh.mahajan@mongodb.com', 'name': 'Sulabh Mahajan', 'username': 'sulabhM'}Message: |
| Comment by Eric Milkie [ 05/Mar/19 ] |
|
Yeah, that tracker sounds better to use. |
| Comment by Matthew Russotto [ 05/Mar/19 ] |
|
I believe lockerInfo won't tell you about locks which have been released, so you'll end up missing the storage engine stats more often than necessary. There's also the GlobalAcqusitionLockTracker, which tells if a global lock has ever been acquired in X, IX, or S (even if subsequently released); you could add IS to that and use it. |
| Comment by Eric Milkie [ 05/Mar/19 ] |
|
matthew.russotto, I don't think we should use the proposed solution for the problem as presented. I believe if we remove the global lock acquisition for otherwise lock-free operations, that would solve your issue, correct? |
| Comment by Bruce Lucas (Inactive) [ 04/Mar/19 ] |
|
Can you clarify what user-visible content (e.g. log lines, currentOp output, serverStatus, FTDC output) is impacted, and what exactly "storage stats" refers to (ideally in terms of the affected user-visible artifacts)? |
| Comment by Kelsey Schubert [ 04/Mar/19 ] |
|
It's not ideal, especially since when things are slow that's generally when you most want diagnostics. But, I'm not sure I have a better idea. It sounds like this is our standard way of resolving this issue until PM-1242 lands. bruce.lucas, feel free to chime in if I'm missing anything. |
| Comment by Judah Schvimer [ 04/Mar/19 ] |
|
kelsey.schubert, is this change acceptable? |