[SERVER-39727] Allow internal server components to track "real" global lock acquisitions Created: 21/Feb/19  Updated: 29/Oct/23  Resolved: 05/Mar/19

Status: Closed
Project: Core Server
Component/s: Storage
Affects Version/s: None
Fix Version/s: 4.1.9

Type: New Feature Priority: Major - P3
Reporter: Daniel Gottlieb (Inactive) Assignee: Daniel Gottlieb (Inactive)
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
is depended on by SERVER-39673 Commit Flow Control mechanism to mast... Closed
Backwards Compatibility: Fully Compatible
Sprint: Storage NYC 2019-02-25, Storage NYC 2019-03-11
Participants:

 Description   

All locks with the same resource type are grouped together for lock stats reporting. Thus "the" global lock is merged with the PBWM and RSTL.

The result of this ticket will be to count "real" global lock acquisitions separately (similar to how the oplog is carved out), however server status will continue to return the same results (by summing up the two counters).



 Comments   
Comment by Githook User [ 05/Mar/19 ]

Author:

{'name': 'Daniel Gottlieb', 'username': 'dgottlieb', 'email': 'daniel.gottlieb@mongodb.com'}

Message: SERVER-39727: Split out "real" global lock acquitions. Leave serverStatus results unchanged.
Branch: master
https://github.com/mongodb/mongo/commit/bf895a8f816b9bae68eb22f349b45e1d1d5f2da7

Comment by Kelsey Schubert [ 22/Feb/19 ]

Thanks for the clarification!

Comment by Daniel Gottlieb (Inactive) [ 22/Feb/19 ]

That makes sense, but as discussed offline, the RSTL is not related to flow control. It stands for replication state transition lock and is documented here.

Comment by Kelsey Schubert [ 21/Feb/19 ]

RSTL is a new thing in 4.2, and so the end result of the metric change between 4.0 and 4.2 is what I care about from a supportability perspective. If we don't change the server status results to break out RSTL, my understanding is that there'll be a significant difference in the meaning of the metric from 4.0 to 4.2. If we do make the change, I would expect it to be much easier to follow previous diagnostic procedures. Does that make sense?

Comment by Daniel Gottlieb (Inactive) [ 21/Feb/19 ]

kelsey.schubert can you clarify what the downstream impact is? This ticket is not* changing server status results.

Comment by Daniel Gottlieb (Inactive) [ 21/Feb/19 ]

I'm not sure that summing them in the metrics is the right thing to do.

I think that perspective is valid. From a code standpoint, my belief is that's a small bit easier. This proposal however is the least friction way to get at existing data by not proposing an API change. It should be relatively easy to change from one way to the other if other stakeholders feel that's best.

I'm also concerned that the new RSTL lock seems to be acquired at a non-trivial rate I'm concerned that this may be changing the meaning and diagnostic significant of the globalLock metrics, so it might be better to remove it from globalLock as well.

Myself and others share this concern. I can't say if the same attention to detail being given to splitting out the locks was given to lock stats' diagnostic potential when deciding to add the RSTL (even if, semantically, it just split off some responsibility the existing global lock was being used for).

I assume also we will be separately tracking acquireWaitCount and timeAcquiringMicros, and not just acquireCount?

Correct.

Comment by Bruce Lucas (Inactive) [ 21/Feb/19 ]

I'm not sure that summing them in the metrics is the right thing to do. In SERVER-33792 we contemplated removing PBWM from globalLock and reporting it separately. I'm also concerned that the new RSTL lock seems to be acquired at a non-trivial rate I'm concerned that this may be changing the meaning and diagnostic significant of the globalLock metrics, so it might be better to remove it from globalLock as well.

I assume also we will be separately tracking acquireWaitCount and timeAcquiringMicros, and not just acquireCount?

Generated at Thu Feb 08 04:52:56 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.