[SERVER-6507] mongostat lock% reports a number higher than (100% times number of cores) Created: 18/Jul/12  Updated: 11/Jul/16  Resolved: 02/Aug/12

Status: Closed
Project: Core Server
Component/s: Tools
Affects Version/s: 2.2.0-rc0
Fix Version/s: 2.2.0-rc1

Type: Bug Priority: Major - P3
Reporter: Eric Milkie Assignee: Eliot Horowitz (Inactive)
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Operating System: ALL
Participants:

 Description   

From a user with an 8 core machine:

insert  query update delete getmore command flushes mapped  vsize    res non-mapped faults    locked db idx miss %     qr|qw   ar|aw  netIn netOut  conn    set repl       time 
     6   1104   1133      0     679    1110       0  6.97g    15g  4.61g      8.03g      0 test:1158.6%          0      21|0    0|19    14m    44m    50 RS  PRI   07:09:33 
     6   1187   1168      0     685    1213       0  6.97g    15g  4.63g      8.03g      0  test:542.0%          0       0|0     0|0    15m    46m    50 RS  PRI   07:09:34 
    15   1225   1248      0     663    1264       0  6.97g    15g  4.66g      8.03g      0  test:568.3%          0       0|0     0|0    16m    48m    50 RS  PRI   07:09:35 
     9   1006   1008      0     533    1016       0  6.97g    15g  4.65g      8.03g      0 test:1215.0%          0       0|0     0|0    13m    39m    50 RS  PRI   07:09:36 
     8   1232   1214      0     681    1213       0  6.97g    15g  4.69g      8.03g      0  test:716.0%          0       9|0     0|1    15m    47m    50 RS  PRI   07:09:37 
    13   1004   1026      0     543    1010       0  6.97g    15g   4.7g      8.03g      0  test:455.7%          0       0|0    0|40    13m    40m    50 RS  PRI   07:09:38 
     3    779    775      0     564     801       0  6.97g    15g   4.7g      8.03g      0 test:2367.7%          0      20|0     0|1    10m    29m    50 RS  PRI  

What is this stat measuring?



 Comments   
Comment by auto [ 02/Aug/12 ]

Author:

{u'date': u'2012-08-02T12:02:56-07:00', u'email': u'eliot@10gen.com', u'name': u'Eliot Horowitz'}

Message: SERVER-6507 - lock reporting fixing

Comment by Dwight Merriman [ 27/Jul/12 ]

investigating.
aside: not sure "# of cores" necessarily matters the # of threads inside a read lock could be higher than that for a few reasons?" if about # of connecitons, that sounds wrong.

Comment by Ian Whalen (Inactive) [ 27/Jul/12 ]

http://codereview.10gen.com/8186055/

Comment by Eric Milkie [ 26/Jul/12 ]

Dwight's working on this.

Comment by Eric Milkie [ 25/Jul/12 ]

I found a minor bug when recording time during the exclusive lock phase of journal commit. Also, recursive locks might be counting double and I'll fix that as well.
But the biggest problem turned out to be that the thread scheduler was defeating our attempts to record lock time. We are stopping the lock time stopwatch after we unlock the qlock mutex. Since we signal condvars as part of qlock unlock(), it is possible that you can get a chain of lock/unlocks for a given DB lock, without any of their lock times being recorded until after they are all done. Thus all the threads are overlapping their lock times.
Dwight is looking into changing this behavior so that we stop the lock time stopwatch within the qlock mutex.

Generated at Thu Feb 08 03:11:53 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.