[SERVER-33285] Add "thread time waiting for response from replication sync source" stats Created: 13/Feb/18 Updated: 06/Dec/22 |
|
| Status: | Backlog |
| Project: | Core Server |
| Component/s: | Diagnostics, Replication |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Dmitry Agranat | Assignee: | Backlog - Replication Team |
| Resolution: | Unresolved | Votes: | 3 |
| Labels: | SWDI, former-quick-wins | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Assigned Teams: |
Replication
|
||||||||
| Participants: | |||||||||
| Description |
|
When there is an increase in replicated write ops to a secondary, serverStatus includes stats for reader threads waiting for a global lock Global timeAcquiringMicros r, but doesn't report how this would have impacted the thread reading the oplog to replicate to another secondary (in case of a chained replication). |
| Comments |
| Comment by Alyson Cabral (Inactive) [ 04/Apr/18 ] | ||
|
Got it, so this seems completely unrelated to Global timeAcquiringMicros r This is about providing a measurement of sync source speed (or health) relative to the requesting server. spencer tess.avitabile Let's discuss in the next product/repl triage meeting to determine if this is the best way to measure relative sync source perf. Let's discuss how hard this is by itself. I could also see this being a part of a larger project making chaining better/smarter. | ||
| Comment by Geert Bosch [ 04/Apr/18 ] | ||
|
The time acquiring the global lock is just one component of the total time it takes a node to process a request. In addition, you' d really want to measure the operation time on the node issuing the request, as that also includes network latency. For example, if you have a local sync source that returns batches in 50ms, and one halfway across the world that returns it in 10ms. It would still be faster to read from the local node in this case. | ||
| Comment by Asya Kamsky [ 04/Apr/18 ] | ||
|
Isn't this already available on the secondary where this getMore is running?
Here is a secondary that's slow to service getMores due to high global timeAcquiringMicros... right? | ||
| Comment by Alyson Cabral (Inactive) [ 04/Apr/18 ] | ||
|
Or will that essentially become the value we are asking for here? | ||
| Comment by Alyson Cabral (Inactive) [ 04/Apr/18 ] | ||
|
With the addition of snapshot reads during oplog application on secondaries, is the Global timeAcquiringMicros r sever status field going away? geert.bosch | ||
| Comment by Gregory McKeon (Inactive) [ 26/Mar/18 ] | ||
|
ping alyson.cabral |