-
Type: Task
-
Resolution: Unresolved
-
Priority: Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
Replication
Right now we approximate this mutex contention by looking at the ping time (since that ends up taking the replication coordinator mutex), we should add a metric that can track this directly. We should be able to time how long it takes to take the mutex. I'm not sure if this can cause enough increased load on the mutex to make the situation worse for customers.
At the very least we could add a command that just takes the mutex and releases it if we wanted to time how long that command took. We don't have to call the command as part of collecting FTDC, but we could manually call it for clusters that we suspect mutex contention.