[SERVER-40856] Log replication progress in hang analyzer Created: 26/Apr/19 Updated: 06/Dec/22 |
|
| Status: | Open |
| Project: | Core Server |
| Component/s: | Replication, Testing Infrastructure |
| Affects Version/s: | None |
| Fix Version/s: | 4.1 Desired |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Judah Schvimer | Assignee: | Backlog - Server Tooling and Methods (STM) (Inactive) |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | move-sdp-candidate, tig-hanganalyzer, tig-qwin-eligible | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Assigned Teams: |
Server Tooling & Methods
|
||||||||
| Participants: | |||||||||
| Description |
|
Some repl hangs are due to nodes not replicating rather than an actual deadlock. It would be helpful if the hang analyzer called replSetGetStatus on every node in the cluster while the process was still alive. If the replSetGetStatus call hangs because of a deadlock on the ReplicationCoordinator mutex, then the replication progress is probably not important anyways so it's not a problem to just kill that command. |
| Comments |
| Comment by Lingzhi Deng [ 28/Jan/22 ] |
|
I think this is still something nice to have when diagnosing BFs. But it is not pressing. |