[SERVER-4491] count() on a find() returns scanned count instead of result count Created: 14/Dec/11  Updated: 07/Apr/23  Resolved: 19/Dec/11

Status: Closed
Project: Core Server
Component/s: Querying
Affects Version/s: 2.0.1
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Frédéric G. MARAND Assignee: Aaron Staple
Resolution: Duplicate Votes: 0
Labels: count, find
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Ubuntu 11.04, Linux bs-linux64.10gen.cc 2.6.21.7-2.ec2.v1.2.fc8xen #1 SMP Fri Nov 20 17:48:28 EST 2009 x86_64 BOOST_LIB_VERSION=1_41


Attachments: PNG File countnull.png    
Issue Links:
Duplicate
duplicates SERVER-4529 fast count mode should not be used wh... Closed
Related
is related to SERVER-4493 v0 indexes store empty arrays as null... Closed
Operating System: Linux
Participants:

 Description   

On a largeish collection, results from count() applied to a query appear to return the number of scanned objects instead of the number of objects returned by the find():

db.activity.findOne(

{'datacache.user.id': null}

)
null

> db.activity.find(

{'datacache.user.id': null}

)
(No result)

> db.activity.find(

{'datacache.user.id': null}

).count()
816

> db.activity.count(

{'datacache.user.id': null}

)
816

> db.activity.find(

{'datacache.user.id': null}

).explain()
{
"cursor" : "BtreeCursor datacache_user_id_1",
"nscanned" : 816,
"nscannedObjects" : 816,
"n" : 0,
"millis" : 44,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : true,
"indexOnly" : false,
"indexBounds" :

{ "datacache.user.id" : [ [ null, null ] ] }

}

Searching for values besides null in that collection is inconclusive: result count always matches scanned count in such cases, they only differ for null.



 Comments   
Comment by Aaron Staple [ 19/Dec/11 ]

Hi Julien,

The root cause here is SERVER-4529. We have a 'fast' count mode that has been in the code for a while, and is being called inappropriately in the example you provided.

If you need to run this query on your production system, I'd suggest doing a normal find and counting the results rather than running the count command for this specific query. If you run into performance issues doing that, I can make some alternative suggestions that would be faster.

I am going to close out this ticket as a duplicate of SERVER-4529. You can still comment and reopen if you have any additional concerns/issues.

Comment by Aaron Staple [ 19/Dec/11 ]

Hi Julien,

I think I'm figuring out what the main issue is, but also it looks like your document output is messed up. It is malformed json because there are some commas missing. Did you paste the output verbatim from the shell, or did you modify it to remove confidential information? If you did modify the document, could you instead modify it by replacing fields with dummy information (like 'XXXXXXXX' or something ) instead of removing them?

Thanks,
Aaron

Comment by Julien Dubreuil [ 19/Dec/11 ]

Hi Aaron,

Effectively, I should have done a db.activity.getIndexes() to see the structure of my indexes. You were right, only names are differents, keys patterns are the same.

Here is the output of the db.activity.getIndexes():

> db.activity.getIndexes()
[
	{
		"v" : 1,
		"key" : {
			"_id" : 1
		},
		"ns" : "activity.activity",
		"name" : "_id_"
	},
	{
		"v" : 1,
		"key" : {
			"created" : -1
		},
		"ns" : "activity.activity",
		"name" : "created_-1"
	},
	{
		"v" : 1,
		"key" : {
			"uids" : 1
		},
		"ns" : "activity.activity",
		"name" : "uids_1"
	},
	{
		"v" : 1,
		"key" : {
			"datacache.user.id" : 1
		},
		"ns" : "activity.activity",
		"name" : "datacache_user_id_1"
	},
	{
		"v" : 1,
		"key" : {
			"nids" : 1
		},
		"ns" : "activity.activity",
		"name" : "nids_1"
	},
	{
		"v" : 1,
		"key" : {
			"datacache.node.id" : 1
		},
		"ns" : "activity.activity",
		"name" : "datacache_node_id_1"
	},
	{
		"v" : 1,
		"key" : {
			"cids" : 1
		},
		"ns" : "activity.activity",
		"name" : "cids_1"
	},
	{
		"v" : 1,
		"key" : {
			"datacache.comment.id" : 1
		},
		"ns" : "activity.activity",
		"name" : "datacache_comment_id_1"
	},
	{
		"v" : 1,
		"key" : {
			"gids" : 1
		},
		"ns" : "activity.activity",
		"name" : "gids_1"
	},
	{
		"v" : 1,
		"key" : {
			"aurid" : 1
		},
		"ns" : "activity.activity",
		"name" : "aurid_1"
	},
	{
		"v" : 1,
		"key" : {
			"sid" : 1
		},
		"ns" : "activity.activity",
		"name" : "sid_1"
	},
	{
		"v" : 1,
		"key" : {
			"datacache.fbstatus.id" : 1
		},
		"ns" : "activity.activity",
		"name" : "datacache_fbstatus_id_1"
	}
]

And here is the output of your requested command :

> db.activity.find().min({'datacache.user.id': null}).sort({'datacache.user.id': 1})[0] 
{
    "_id": ObjectId("4ea5f3326e9f88116300001f"),
    "authors_uid": {
        "1173746": 1173746
    },
    "created": 1319498546,
    "datacache": {
        "node": [{
            "authors": null,
            "build_mode": "profil_N05",
            "comment": "2",
            "created": 1319444273,
            "uid": "1173746",
            "format" : 2,
            "id" : 579896,
            "nid" : "579896",
            "uid" : "1173746"
        }],
        "user": [{
            "uid": "11737",
            "name": "Kate",
            "level": 17,
            "account_type": 6,
            "id": 11737
        }, {
            "id": 6,
            "name": "Antoine",
            "level": 17,
            "account_type": 5,
            "uid": "6"
        }]
    },
    "flaggeur": "6",
    "flaggued": "1173746",
    "hidden": false,
    "is_active": 1,
    "nid": 579896,
    "type": "flagreconode",
    "uids": [6]
}

Comment by Aaron Staple [ 16/Dec/11 ]

Hi Julien,

I'm not that familiar with the php driver, but what you are basically looking at here are the index names - which are generated based on the index key pattern but are actually pretty arbitrary. My guess is that php and the shell are generating slightly different names but the same key pattern. To see some more detailed information about indexes you can run db.activity.getIndexes(). Would be great if you can send the output of getIndexes().

Also, to try and diagnose the situation further, could you please run the following and send me the output:

db.activity.find().min(

{'datacache.user.id':null}

).sort(

{'datacache.user.id':1}

)[0]

Thanks,
Aaron

Comment by Julien Dubreuil [ 16/Dec/11 ]

Hi Aaron

Yeah, that's true, but I thought I had made a mistake when a did db.activity.ensureIndex(

{'datacache.user.id' : 1}

), because it doesn't looks like ot others index.

> db.activity.stats()
{
	"ns" : "activity.activity",
	"count" : 946288,
	"size" : 1856841436,
	"avgObjSize" : 1962.23711597315,
	"storageSize" : 2127925248,
	"numExtents" : 25,
	"nindexes" : 13,
	"lastExtentSize" : 362020864,
	"paddingFactor" : 1.5699999999999983,
	"flags" : 0,
	"totalIndexSize" : 297843504,
	"indexSizes" : {
		"_id_" : 30954336,
		"created_-1" : 24021088,
		"uids_1" : 23833040,
		"nids_1" : 16335648,
		"datacache_node_id_1" : 36603952,
		"cids_1" : 16335648,
		"datacache_comment_id_1" : 36718416,
		"gids_1" : 22884624,
		"aurid_1" : 16384704,
		"sid_1" : 16515520,
		"datacache_fbstatus_id_1" : 16760800,
		"datacache_user_id_1" : 16335648,
		"datacache.user.id_1" : 24160080
	},
	"ok" : 1
}

Maybe I missed some things, but why when I created an index using the php drivers and Mongo shell I don't have the same result.

With PHP

 
  $collection = ActivityNamespace::getCollection();
  $response = $collection->ensureIndex(array('datacache.user.id' => 1 ));

> db.activity.stats()
{
	"ns" : "activity.activity",
	"count" : 946288,
	"size" : 1856841436,
	"avgObjSize" : 1962.23711597315,
	"storageSize" : 2127925248,
	"numExtents" : 25,
	"nindexes" : 2,
	"lastExtentSize" : 362020864,
	"paddingFactor" : 1.5699999999999983,
	"flags" : 0,
	"totalIndexSize" : 55114416,
	"indexSizes" : {
		"_id_" : 30954336,
		"datacache_user_id_1" : 24160080
	},
	"ok" : 1
}
 

With Mongo shell

> db.activity.ensureIndex({'datacache.user.id' : 1})
> db.activity.stats()
{
	"ns" : "activity.activity",
	"count" : 946288,
	"size" : 1856841436,
	"avgObjSize" : 1962.23711597315,
	"storageSize" : 2127925248,
	"numExtents" : 25,
	"nindexes" : 2,
	"lastExtentSize" : 362020864,
	"paddingFactor" : 1.5699999999999983,
	"flags" : 0,
	"totalIndexSize" : 55114416,
	"indexSizes" : {
		"_id_" : 30954336,
		"datacache.user.id_1" : 24160080
	},
	"ok" : 1
}

What are the differences between datacache_user_id_1 and datacache.user.id_1 ?

Anyway, I did it again with the Mongo shell and the problem is still there :

> db.activity.ensureIndex({'datacache.user.id' : 1})
> db.activity.stats()
{
	"ns" : "activity.activity",
	"count" : 946288,
	"size" : 1856841436,
	"avgObjSize" : 1962.23711597315,
	"storageSize" : 2127925248,
	"numExtents" : 25,
	"nindexes" : 12,
	"lastExtentSize" : 362020864,
	"paddingFactor" : 1.5699999999999983,
	"flags" : 0,
	"totalIndexSize" : 253807568,
	"indexSizes" : {
		"_id_" : 30954336,
		"created_-1" : 23816688,
		"uids_1" : 23816688,
		"nids_1" : 16204832,
		"datacache_node_id_1" : 23644992,
		"cids_1" : 16204832,
		"datacache_comment_id_1" : 23268896,
		"gids_1" : 22729280,
		"aurid_1" : 16253888,
		"sid_1" : 16376528,
		"datacache_fbstatus_id_1" : 16376528,
		"datacache.user.id_1" : 24160080
	},
	"ok" : 1
}
> db.activity.count({'datacache.user.id': null})
816
> db.activity.findOne({'datacache.user.id': null})
null

Comment by Aaron Staple [ 16/Dec/11 ]

Hi Julien,

It looks like there is an error where you are recreating your index. You ran:

db.activity.ensureIndex(

{'datacache_user_id' : 1}

)

but it needs to be:

db.activity.ensureIndex(

{'datacache.user.id' : 1}

)

If you rebuild a v1 index as I described and still see this problem, we can further diagnose the situation.

Comment by Julien Dubreuil [ 15/Dec/11 ]

Hi Aaron

I'm working with Frederic on our project based on MongoDB

Here is a copy of my indexes (ready to discuss with you about optimization

> db.activity.stats()
{
  "ns" : "activity.activity",
  "count" : 953941,
  "size" : 1869070588,
  "avgObjSize" : 1959.3146620178816,
  "storageSize" : 2127925248,
  "numExtents" : 25,
  "nindexes" : 12,
  "lastExtentSize" : 362020864,
  "paddingFactor" : 1,
  "flags" : 1,
  "totalIndexSize" : 284925424,
  "indexSizes" : {
    "_id_" : 30962512,
    "created_-1" : 24012912,
    "uids_1" : 24012912,
    "datacache_user_id_1" : 34690768,
    "nids_1" : 16335648,
    "datacache_node_id_1" : 33104624,
    "cids_1" : 16335648,
    "datacache_comment_id_1" : 32957456,
    "gids_1" : 22892800,
    "aurid_1" : 16376528,
    "sid_1" : 16507344,
    "datacache_fbstatus_id_1" : 16736272
  },
  "ok" : 1
}

We always used a Mongo 2.0 for our project, so indexes were not created on an earlier version.

As you requested, I've dropped indexes, tested it, recreated it and then the problem has gone.

> db.activity.count({'datacache.user.id': null})
816
 
> db.activity.dropIndex({'datacache_user_id' : 1})
{ "nIndexesWas" : 12, "ok" : 1 }
 
> db.activity.count({'datacache.user.id': null})
0
 
> db.activity.ensureIndex({'datacache_user_id' : 1}
 
> db.activity.count({'datacache.user.id': null})
0

It should be noted, after the restoration of the datas, I did a big update on to the structure of every documents of the collection in order to change the non indexable sub-document structure.

So, what I've done :
1) I've dumped a database.
2) I've restored it on my local machine without indexes.
3) Next, I've recreated them.
4) I've done my update
5) Try to count no changed documents.

Maybe it's not reproducible and not really a bug, but one thing's sure, my update is in cause

Thanks,

Julien

Comment by Aaron Staple [ 14/Dec/11 ]

Hi Frederic - I created SERVER-4493. I think this may be the issue you're seeing, please let me know if you have any issues once you regenerate your index as a v1 index.

Comment by Aaron Staple [ 14/Dec/11 ]

Hi Frederic,

I think this is an issue with v0 indexes combined with some new multikey index matching behavior in mongo 2.0. Did you originally create your index with an earlier version of mongo? I am going to file a separate ticket for what I think is the root cause, I just want to confirm that it's actually the cause of your problem.

Aaron

Comment by Aaron Staple [ 14/Dec/11 ]

Hi Frederic,

If you recreate the index, does the issue start happening again?
And also, can you send the output of db.activity.getIndexes() please?

Thanks,
Aaron

Comment by Frédéric G. MARAND [ 14/Dec/11 ]

This looks like an issue with indexes: I removed the "datacache_user_id_1" index and reran the queries:

db.activity.find(

{'datacache.user.id': null}

)
> (no result)

> db.activity.findOne(

{'datacache.user.id': null}

)
null

> db.activity.find(

{'datacache.user.id': null}

).count()
0

db.activity.find(

{'datacache.user.id': null}

).explain()
{
"cursor" : "BasicCursor",
"nscanned" : 952765,
"nscannedObjects" : 952765,
"n" : 0,
"millis" : 7532,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : false,
"indexOnly" : false,
"indexBounds" : {

}
}

Generated at Thu Feb 08 03:06:08 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.