[SERVER-47446] Measure cpu time taken by operations Created: 09/Apr/20 Updated: 29/Oct/23 Resolved: 14/Oct/20 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Internal Code |
| Affects Version/s: | None |
| Fix Version/s: | 4.9.0 |
| Type: | New Feature | Priority: | Critical - P2 |
| Reporter: | Mira Carey | Assignee: | Amirsaman Memaripour |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||
| Sprint: | Service arch 2020-10-19 | ||||||||||||
| Participants: | |||||||||||||
| Case: | (copied to CRM) | ||||||||||||
| Description |
|
It would be valuable to measure cpu time consumed by an operation, both as it's running and rolled up by user (to measure resource consumption by a particular user). We could potentially expose this via currentOp, via server status metrics or perhaps in a custom agg pipeline (if by user). On linux pthread_getcpuclockid() offers a way to access cpu time consumed by a particular thread relatively easily. In turn, we could capture time at opCtx creation, then store the delta on each call to opCtx->checkForInterrupt, then flush the final value into the user level roll up at opCtx destruction. |
| Comments |
| Comment by Githook User [ 14/Oct/20 ] |
|
Author: {'name': 'Amirsaman Memaripour', 'email': 'amirsaman.memaripour@mongodb.com', 'username': 'samanca'}Message: |
| Comment by Bruce Lucas (Inactive) [ 14/Apr/20 ] |
|
Yes to the first sentence. We have gotten better about accounting for wait time (e.g. we have storage wait time now from WT), but by no means all of it is accounted for, so this would help us know whether it was CPU time or some unaccounted-for wait time. Regarding your second suggestion, sounds useful for currentOp, but maybe less so for slow query logging. |
| Comment by Mira Carey [ 13/Apr/20 ] |
To break out when operations are slow due to blocking vs because they're actually doing work? I wonder if a break down about cpu cycles used in the recent past might be a nice addition if that's a specific use case |
| Comment by Bruce Lucas (Inactive) [ 13/Apr/20 ] |
|
I think it could be helpful to have this in logged slow operations. |