[SERVER-34722] Add new server status metrics about oplog application Created: 27/Apr/18 Updated: 29/Oct/23 Resolved: 11/Oct/19 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Diagnostics, Replication |
| Affects Version/s: | None |
| Fix Version/s: | 4.3.1, 4.2.7 |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Judah Schvimer | Assignee: | Judah Schvimer |
| Resolution: | Fixed | Votes: | 1 |
| Labels: | former-quick-wins | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||||||||||||||||||||||
| Backport Requested: |
v4.2
|
||||||||||||||||||||||||||||||||||||||||
| Sprint: | Repl 2019-08-12, Repl 2019-08-26, Repl 2019-09-09, Repl 2019-09-23, Repl 2019-10-07, Repl 2019-10-21 | ||||||||||||||||||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||||||||||||||||||
| Description |
|
Ideas for improvements: |
| Comments |
| Comment by Githook User [ 23/Apr/20 ] |
|
Author: {'name': 'Judah Schvimer', 'email': 'judah.schvimer@10gen.com', 'username': 'judahschvimer'}Message: (cherry picked from commit 9b3801e457c4952e36f2a13d45387d647c301e03) |
| Comment by Githook User [ 11/Oct/19 ] |
|
Author: {'username': 'judahschvimer', 'email': 'judah.schvimer@10gen.com', 'name': 'Judah Schvimer'}Message: |
| Comment by Kelsey Schubert [ 25/Sep/19 ] |
|
I think that's sufficient. Thanks! |
| Comment by Judah Schvimer [ 11/Sep/19 ] |
kelsey.schubert, How is this different from "metrics.repl.network.getmores.num"?
I think what this is referring to is this "appliedOpTime" field in replSetGetStatus. Do we want this in serverStatus too, or is that sufficient? |
| Comment by Bruce Lucas (Inactive) [ 28/Jan/19 ] |
|
Average parallelism for each batch, updated at end of each batch, could be useful: sum of times spent by individual worker threads applying ops divided by total time for batch. |
| Comment by Judah Schvimer [ 28/Jan/19 ] |
|
Another metric to consider is how well we are using parallelism in secondary oplog application. I'm not sure of the best way to capture this, but somehow checking if each worker thread on a secondary has a similar number of ops or is working for a similar amount of time per batch. Idle worker threads mean we're not being efficient with our parallelism. |
| Comment by Judah Schvimer [ 02/Nov/18 ] |
|
Two things to add:
|
| Comment by Kelsey Schubert [ 01/Nov/18 ] |
|
I think I'm most interested in:
Less sure about:
Already have (unless I'm misunderstanding):
|
| Comment by Bruce Lucas (Inactive) [ 29/Oct/18 ] |
|
Those metrics sound useful. |
| Comment by Judah Schvimer [ 27/Apr/18 ] |
|
CC kelsey.schubert bruce.lucas, please add any other metrics you think would be helpful and the best format for metrics. |