[SERVER-13339] One constructor for Client::Context does not properly set the curop Created: 25/Mar/14  Updated: 03/Jan/18  Resolved: 07/Oct/15

Status: Closed
Project: Core Server
Component/s: Diagnostics, Internal Code
Affects Version/s: 2.4.9, 2.6.0-rc2
Fix Version/s: 3.2.0-rc0

Type: Bug Priority: Major - P3
Reporter: Linda Qin Assignee: Kaloian Manassiev
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
Backwards Compatibility: Fully Compatible
Operating System: ALL
Steps To Reproduce:
  1. Start a replica set.
  2. Insert a document to the primary.
  3. Run db.adminCommand( {top:1}

    ) on the primary and a secondary.

Sprint: Sharding A (10/09/15)
Participants:
Case:

 Description   

The two-argument constructor for Client::Context does not call enter() on the curop object, so the current op is never properly updated. This causes some weird behavior, such as the entries with empty field names in the output of the "top" command. I suspect (though have not confirmed) that this also has an impact on the profiler and the output of the inprog command (db.currentOp()).

This problem was exacerbated in 2.6 by the introduction of more callers of this 2-argument form of the Client::Context constructor.

PREVIOUS DESCRIPTION:

In 2.4, if a node is/was a secondary, there is a sub-document with empty field name in the output of top command as below. It has records for "writeLock" and "total".

> db.adminCommand({top:1})
{
	"totals" : {
		"note" : "all times in microseconds",
		"" : {
			"total" : {
				"time" : 27583,
				"count" : 1001
			},
			"readLock" : {
				"time" : 0,
				"count" : 0
			},
			"writeLock" : {
				"time" : 27583,
				"count" : 1001
			},
			"queries" : {
				"time" : 0,
				"count" : 0
			},
			"getmore" : {
				"time" : 0,
				"count" : 0
			},
			"insert" : {
				"time" : 0,
				"count" : 0
			},
			"update" : {
				"time" : 0,
				"count" : 0
			},
			"remove" : {
				"time" : 0,
				"count" : 0
			},
			"commands" : {
				"time" : 0,
				"count" : 0
			}
		},
......
}

In 2.6, it shows up on all the nodes in the replica set, with records for "readLock", "writeLock" (only for secondaries) and "total".

> db.adminCommand({top:1})
{
	"totals" : {
		"note" : "all times in microseconds",
		"" : {
			"total" : {
				"time" : 1353,
				"count" : 752
			},
			"readLock" : {
				"time" : 1277,
				"count" : 751
			},
			"writeLock" : {
				"time" : 76,
				"count" : 1
			},
...
}



 Comments   
Comment by Githook User [ 07/Oct/15 ]

Author:

{u'username': u'kaloianm', u'name': u'Kaloian Manassiev', u'email': u'kaloian.manassiev@mongodb.com'}

Message: SERVER-13339 Remove unused constructor of OldClientContext
Branch: master
https://github.com/mongodb/mongo/commit/7b02fba9dc8852c09afb92477fd306077527da0c

Comment by Kaloian Manassiev [ 05/Oct/15 ]

In 3.0 and 3.2, we tightened the rules for when CurOp is being entered and instantiated, so this bug should not be present anymore. In 3.2 this constructor is no longer used, so I'll remove it.

Comment by Githook User [ 20/Feb/15 ]

Author:

{u'username': u'kaloianm', u'name': u'Kaloian Manassiev', u'email': u'kaloian.manassiev@mongodb.com'}

Message: SERVER-13339 Cleanup some SConscript and includes
Branch: master
https://github.com/mongodb/mongo/commit/3211addd3426161d7ee1f5cc8d191418a013018d

Comment by Githook User [ 13/Feb/15 ]

Author:

{u'username': u'kaloianm', u'name': u'Kaloian Manassiev', u'email': u'kaloian.manassiev@mongodb.com'}

Message: SERVER-13339 Cleanup in preparation for removing unused constructor
Branch: master
https://github.com/mongodb/mongo/commit/f5fc5c0ab87b4f0909387af9819fe87e5ed4758c

Comment by Spencer Brody (Inactive) [ 04/Apr/14 ]

An alternative patch that may be slightly safer is to not call of of _finishInit from that constructor, but just add a call to _client_curOp->enter(this), which is the part of _finishInit that sets the namespace to the CurOp object properly. There is already another form of the Context constructor that does just that: https://github.com/mongodb/mongo/blob/master/src/mongo/db/client.cpp#L347

MCI run with this smaller change: https://mci.10gen.com/ui/version/533ef3d53ff1221570000004_0

Edit: The first patch seems to have seriously broken things, and the second seems to have broken profiling, at least. It seems like neither of these approaches are correct. Someone with a better understanding of the implications of changes to the Client::Context constructors should take a look.

Comment by Spencer Brody (Inactive) [ 04/Apr/14 ]

Okay, it looks like the data that goes into the object returned by "top" comes from the UsageMap maintained in top.cpp. Entries get added into that map from Top::record, which is called by CurOp::recordGlobalTime, which is called in the destructor for Client::Context. The namespace added to the map is set in CurOp::recordGlobalTime and comes from the _ns stored in the CurOp object.

The problem seems to be coming from the fact that there is one constructor of Client::Context that doesn't call _finishInit and thus doesn't call _client->_curOp->enter(this), which is what sets the _ns field in the CurOp object. This is the two-argument form of the Context contructor, that takes a string ns and a Database* db. Anyone who uses this form of Client::Context will result in adding entries to the Top::UsageMap with an empty string as the namespace, as the underlying CurOp object never gets initialized to the namespace of the Context, but the Context's destructor still calls _client->_curOp->recordGlobalTime.

The weird thing is that this behavior seems to be intentional? This comment seems to suggest that this form of the Context constructor not calling _finishInit is intentional, but it's not clear why.

The main caller of that form of the Context constructor that I was seeing is from the cursor management code in collection_cursor_cache.cpp, though there are other callers as well. It seems like that was introduced in https://github.com/mongodb/mongo/commit/5548dbb7a13936911b1b482ddd417cf5879e01b4.

Adding _finishInit() to the end of the two-argument Client::Context constructor seems to fix the problem, but I'm worried what other effects it may have, especially as I don't understand why it was omitted in the first place. I'm running a patch that does that through MCI right now: https://mci.10gen.com/ui/version/533ef1923ff1221570000003_0

Generated at Thu Feb 08 03:31:24 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.