[SERVER-28310] Mongos count command should report errors in errmsg field, not cause field Created: 23/Feb/16  Updated: 05/Apr/17  Resolved: 22/Mar/17

Status: Closed
Project: Core Server
Component/s: Querying, Sharding
Affects Version/s: 3.4.2
Fix Version/s: 3.5.5

Type: Improvement Priority: Major - P3
Reporter: Igor Wiedler Assignee: David Storch
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Backwards Compatibility: Fully Compatible
Sprint: Query 2017-03-27
Participants:

 Description   

Specifying a query hint that points to a non-existent index produces a rather generic error message:

Fatal error: Uncaught exception 'MongoDB\Driver\Exception\ConnectionException' with message 'failed on : shard0000' in app/vendor/composer/mongodb/mongodb/src/Operation/Count.php:111
Stack trace:
#0 app/vendor/composer/mongodb/mongodb/src/Operation/Count.php(111): MongoDB\Driver\Server->executeCommand('keywords', Object(MongoDB\Driver\Command), Object(MongoDB\Driver\ReadPreference))
#1 app/vendor/composer/mongodb/mongodb/src/Collection.php(233): MongoDB\Operation\Count->execute(Object(MongoDB\Driver\Server))
#2 app/scratch08.php(20): MongoDB\Collection->count(Array, Array)
#3 {main}
  thrown in app/vendor/composer/mongodb/mongodb/src/Operation/Count.php on line 111

It would be awesome if the error message hinted at what went wrong.

Reproduce case:

$manager = new MongoDB\Driver\Manager("mongodb://localhost:27017");
 
$col = new \MongoDB\Collection($manager, 'keywords', 'keywords');
 
$res = $col->count(
    [
        '_id' => [
            '$in' => []
        ]
    ],
    [
        'hint' => [
            '_not_exists' => 1
        ]
    ]
);



 Comments   
Comment by Githook User [ 22/Mar/17 ]

Author:

{u'username': u'dstorch', u'name': u'David Storch', u'email': u'david.storch@10gen.com'}

Message: SERVER-28310 fix mongos count command error reporting to include details in errmsg field
Branch: master
https://github.com/mongodb/mongo/commit/bc8db13a5be79fa99e2e2f39fb5e00692828f7b3

Comment by Jeremy Mikola [ 21/Mar/17 ]

Jeremy, can you please confirm that this plan makes sense from the driver team's point of view?

That all sounds good to me.

Comment by David Storch [ 20/Mar/17 ]

jmikola, I looked into it, and the cause field is certainly not a generic way in which we report errors on sharded clusters. This appears to be long standing one-off behavior for the count command. It appears to be present in all currently supported versions. In fact, as best I could tell, it was introduced in version 2.1.2 as part of SERVER-5797 in commit 4a9a29437.

It sounds like the correct course of action here is to change mongos to report count command error details in the errmsg field, not in the cause field. Based on your previous response, I infer that this change should have no bad consequences for existing drivers, old and new. Jeremy, can you please confirm that this plan makes sense from the driver team's point of view?

Comment by Jeremy Mikola [ 17/Mar/17 ]

Repro steps look correct. I believe the issue is that drivers have no history of consulting the cause field when constructing an error. Your reply is actually the first I've heard of this field. AFAIK, drivers currently only check the ok, code, and errmsg fields.

Assuming this is a generic situation for shard clusters and cause is a field only populated by mongos, it might make sense for the server to have some specification for merging error messages into errmsg (similar to how drivers have logic to merge potentially multiple write errors into a single message when executing a bulk write). Is it possible for there to be multiple causes of a shard error (e.g. hint did not exist on multiple shards), or does mongos construct this response immediately after the first failure? In the latter case, I suppose there would be nothing to merge and the improvement would just be to integrate cause.errmsg into the top-level errmsg field.

Comment by David Storch [ 17/Mar/17 ]

jmikola igorwrg,

I tried reproducing this issue as follows:

  1. Create a 2-shard cluster.
  2. Shard collection c by _id.
  3. Insert documents with {_id: 1} and {_id: -1} and split/move chunks so that these documents reside on separate shards.

Then I ran the following count command using the mongo shell:

mongos> db.c.find({_id: {$in: []}}).hint({bad: 1}).count()
2017-03-17T12:05:48.679-0400 E QUERY    [thread1] Error: count failed: {
	"shards" : {
 
	},
	"cause" : {
		"ok" : 0,
		"errmsg" : "error processing query: ns=test.cTree: _id $in [ ]\nSort: {}\nProj: {}\n planner returned error: bad hint",
		"code" : 2,
		"codeName" : "BadValue",
		"operationTime" : Timestamp(1489766740, 1)
	},
	"code" : 2,
	"ok" : 0,
	"errmsg" : "failed on : shard0000",
	"logicalTime" : {
		"clusterTime" : Timestamp(1489766740, 1),
		"signature" : BinData(0,"hPmxr1lerWAHG+4z0jUTNL+VLcQ=")
	}
} :
_getErrorWithCode@src/mongo/shell/utils.js:25:13
DBQuery.prototype.count@src/mongo/shell/query.js:383:11
@(shell):1:1

As you can see, the cause section of the error output reports that the planner returned an error due to a bad hint. Are my repro steps incorrect, or is this in fact an issue with error reporting in the driver?

Best,
Dave

Comment by Jeremy Mikola [ 14/Mar/17 ]

Moved this over from the PHPC project, as this appears to be outside the control of PHP or libmongoc. When issuing a query whose hint is a nonexistent index, mongos yields a cryptic error of "failed on : shard0000". Contrast this to the error returned by mongod:

error processing query: ns=test.fooTree: _id $in [ ]
Sort: {}
Proj: {}
 planner returned error: bad hint

I'm not sure what version of the server was used in the original report, but I just reproduced this on 3.4.2 and confirm that the messages are the same.

Generated at Thu Feb 08 04:17:45 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.