[SERVER-7107] Queries/getmores generate write lock stats in system.profile Created: 21/Sep/12  Updated: 06/Dec/22  Resolved: 22/Aug/19

Status: Closed
Project: Core Server
Component/s: Concurrency, Internal Code
Affects Version/s: 2.2.0
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Scott Hernandez (Inactive) Assignee: Backlog - Query Team (Inactive)
Resolution: Done Votes: 3
Labels: query-44-grooming
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
is related to SERVER-7829 Indexes are not maintained in oplog/p... Closed
Assigned Teams:
Query
Operating System: ALL
Participants:

 Description   

After investigating this more, and discussing this with others, here is what I've found:

  • The entries (lock time specifically) in the profile collection include the time taken to write to the system.profile collection, in addition to the orig. op
  • We do not separate the user operation from the system logging/profiling.
  • This is confusing for read-only operations like queries and getmores, since they never need a write lock
  • It would be good if we can get trace or separate levels of reporting for the various stages of the whole operation so it is clear what the cost of the user operation would be without profiling

Originally I had expected, as did the users who noticed this, that "lockStats.timeAcquiringMicros.w" should be zero since it was a read-only operation.

{
	"ts" : ISODate("2012-09-21T18:50:05.301Z"),
	"op" : "query",
	"ns" : "l.c",
	"query" : {
		
	},
	"cursorid" : NumberLong("168104220735197301"),
	"ntoreturn" : 0,
	"ntoskip" : 0,
	"nscanned" : 102,
	"keyUpdates" : 0,
	"numYield" : 0,
	"lockStats" : {
		"timeLockedMicros" : {
			"r" : NumberLong(73),
			"w" : NumberLong(0)
		},
		"timeAcquiringMicros" : {
			"r" : NumberLong(3),
			"w" : NumberLong(2)
		}
	},
	"nreturned" : 101,
	"responseLength" : 17815,
	"millis" : 0,
	"client" : "127.0.0.1",
	"user" : ""
}
{
	"ts" : ISODate("2012-09-21T18:50:05.425Z"),
	"op" : "getmore",
	"ns" : "l.c",
	"cursorid" : NumberLong("168104220735197301"),
	"ntoreturn" : 0,
	"keyUpdates" : 0,
	"numYield" : 0,
	"lockStats" : {
		"timeLockedMicros" : {
			"r" : NumberLong(8781),
			"w" : NumberLong(0)
		},
		"timeAcquiringMicros" : {
			"r" : NumberLong(11),
			"w" : NumberLong(5)
		}
	},
	"nreturned" : 865,
	"responseLength" : 150008,
	"millis" : 8,
	"client" : "127.0.0.1",
	"user" : ""
}



 Comments   
Comment by David Storch [ 16/Aug/19 ]

I cannot reproduce this trivially by running a query with profiling level 2 enabled on the master branch. Looking at the code, it appears that we might be gathering and serializing the lock stats before we ever acquire the locks needed to write to the profile collection. I'm flagging for re-triage. We may want to spend some time investigating to confirm whether this issue has been fixed.

Comment by Asya Kamsky [ 18/Jul/16 ]

Still happening in 3.3.9

db.system.profile.find({op:{$ne:"insert"}},{op:1,"locks.Collection.acquireCount.w":1})
{ "op" : "query", "locks" : { "Collection" : { "acquireCount" : { "w" : NumberLong(1) } } } }
{ "op" : "getmore", "locks" : { "Collection" : { "acquireCount" : { "w" : NumberLong(2) } } } }
{ "op" : "query", "locks" : { "Collection" : { "acquireCount" : { "w" : NumberLong(3) } } } }

More details for a single one:

{
	"op" : "getmore",
	"ns" : "dun.companies",
	"query" : {
		"getMore" : NumberLong("937999518245"),
		"collection" : "companies",
		"batchSize" : NumberLong(0)
	},
	"cursorid" : 937999518245,
	"keysExamined" : 555,
	"docsExamined" : 493,
	"cursorExhausted" : true,
	"numYield" : 24,
	"locks" : {
		"Global" : {
			"acquireCount" : {
				"r" : NumberLong(118),
				"w" : NumberLong(2)
			}
		},
		"Database" : {
			"acquireCount" : {
				"r" : NumberLong(58),
				"w" : NumberLong(2)
			}
		},
		"Collection" : {
			"acquireCount" : {
				"r" : NumberLong(58),
				"w" : NumberLong(2)
			}
		},
		"Metadata" : {
			"acquireCount" : {
				"W" : NumberLong(2)
			}
		}
	},
	"nreturned" : 493,
	"responseLength" : 28076,
	"millis" : 1007,
	"planSummary" : "IXSCAN { parent: 1.0, _id: 1.0 }", ...

Comment by Andy Schwerin [ 08/Mar/13 ]

scotthernandez, please refine the description, then unassign and move to fixVersion = Needs Triage.

Comment by Scott Hernandez (Inactive) [ 16/Nov/12 ]

I turned on profiling at level 2 for all these tests, and the ones above.

I did a little more testing and found that in consecutive runs that the lock time is 0, but the acquire time is somewhere between 3-6micros.

To further testing I times a few inserts (after priming the extent with a write or two) and the lock time is much greater than the lock acquire time by many times. I also tested on a capped collection and it resulted in the same behavior of the lock acquisition being much much lower than the write lock time.

I haven't looked at the code but if the lock times include the global lock that may be a possible explanation.

Comment by Scott Hernandez (Inactive) [ 21/Sep/12 ]

No journaling on osx.

> db.serverStatus()
{
	"host" : "lamest.local",
	"version" : "2.2.0",
	"process" : "mongod",
	"pid" : 90962,
...
	"mem" : {
		"bits" : 64,
		"resident" : 29,
		"virtual" : 2674,
		"supported" : true,
		"mapped" : 208
	},
	"opcounters" : {
		"insert" : 0,
		"query" : 620,
		"update" : 0,
		"delete" : 0,
		"getmore" : 4,
		"command" : 53
	},
	"ok" : 1
}
> db.adminCommand("getCmdLineOpts")
{
	"argv" : [
		"temp/mongodb-osx-x86_64-2.2.0/bin/mongod",
		"--dbpath",
		"/Volumes/256c4ssd/temp/dbs/test1/",
		"--nojournal",
		"--rest"
	],
	"parsed" : {
		"dbpath" : "/Volumes/256c4ssd/temp/dbs/test1/",
		"nojournal" : true,
		"rest" : true
	},
	"ok" : 1
}

Generated at Thu Feb 08 03:13:39 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.