[SERVER-60984] Report time in recipient critical section on serverStatus' shardingStatistics Created: 26/Oct/21 Updated: 29/Oct/23 Resolved: 05/Nov/21 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | None |
| Fix Version/s: | 5.2.0 |
| Type: | Task | Priority: | Major - P3 |
| Reporter: | Jordi Serra Torrens | Assignee: | Jordi Serra Torrens |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||
| Participants: | |||||||||||||||||||||
| Comments |
| Comment by Githook User [ 05/Nov/21 ] |
|
Author: {'name': 'Jordi Serra Torrens', 'email': 'jordi.serra-torrens@mongodb.com', 'username': 'jordist'}Message: |
| Comment by Bruce Lucas (Inactive) [ 27/Oct/21 ] |
|
Yes, I think that could be beneficial, and might be worth doing while working in that area for this ticket. It might also be a useful to take a look at the other phases of chunk migration and similar see if log events with attr.durationMillis would make sense in those cases. |
| Comment by Jordi Serra Torrens [ 27/Oct/21 ] |
I believe they aren't nowadays. If you believe there's benefit to it I'll definitely file a ticket to add this log both for the donor's critical section and the recipient's one (that concerns this ticket) |
| Comment by Bruce Lucas (Inactive) [ 27/Oct/21 ] |
|
jordi.serra-torrens, thanks for the clarification. Actually anything that is in serverStatus is recorded in FTDC and the usual way to include something in FTDC is via serverStatus, so your original phrasing isn't inaccurate, and the same considerations still apply. However since you clarified that the intent is to model it on the existing donor critical section timing, which records total cumulative time spent in critical sections, that is fine as it doesn't have the issues I mentioned if the number recorded were just the most recent critical section time. By the way though, are these critical section events also logged with an attr.durationMillis? Having this information in the log files in such a standard format would make it easier to diagnose problems that could be a result of the critical section. |
| Comment by Jordi Serra Torrens [ 27/Oct/21 ] |
|
bruce.lucas thanks for pointing this out. I believe I didn't really mean FTDC. What I meant is to report a new metric reflecting the "time in recipient critical section", similarly to how we report the donor's critical section nowadays: Through serverStatus.shardingStatistics. |
| Comment by Bruce Lucas (Inactive) [ 26/Oct/21 ] |
|
FTDC contains metrics that are sampled at one second intervals. What does it mean to record this in FTDC? Presumably each sample would report the duration of the most recent critical section? Generally speaking timings of operations are better reported in the log with an attr.durationMillis field. This allows for the case where multiple such events occur within one second, and also avoids recording in FTDC at each second a metric that may record something that happened a long time in the past, which could be misleading. |