[SERVER-1285] Mongo cursor return less entities than it has. Created: 23/Jun/10  Updated: 30/Mar/12  Resolved: 17/Sep/11

Status: Closed
Project: Core Server
Component/s: Querying
Affects Version/s: 1.4.3
Fix Version/s: None

Type: Bug Priority: Critical - P2
Reporter: Pavel Pipkin Assignee: Eliot Horowitz (Inactive)
Resolution: Cannot Reproduce Votes: 4
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
Operating System: FreeBSD
Participants:

 Description   

When I try fetch many big entities from cursor, it's return less than it has.
When I fetch not so match big entities, or small entities, cursor work correctly.
The more memory the object occupies, the less its returns cursor.
Maybe cursor have memory limit. And if entities summary size more than it limit, cursor return not all entities.

Mongo version 1.4.3

Examples:
> var cursor = db.test_coll.find().skip(0).limit(400).toArray();
> cursor.length;
289

But I have more than 289 entities:
> var cursor = db.test_coll.find().skip(0).limit(400);
> cursor.count(true);
400
> var cursor = db.test_coll.find().skip(400).limit(100).toArray();
> cursor.length;
100
> var cursor = db.test_coll.find().skip(400).limit(400).toArray();
> cursor.length;
283

If I run query witout limit, cursor return less entities:
> var cursor = db.test_coll.find().toArray();
> cursor.length;
71
> var cursor = db.test_coll.find();
> cursor.count(true);
835

It problem I have in mongo console and in PHP API:
$data = $mongo_collection_handler->find(array())>skip(0)>limit(400);
echo $data->count(true); // return 835
$counter = 0;
foreach ($data as $value){
$counter++;
}
echo $counter; // return 289



 Comments   
Comment by Benedikt Waldvogel [ 19/Mar/12 ]

After a longer debugging session I found the bug in the code. Surprisingly, the bug was already known and fixed recently: SERVER-4680

Comment by Benedikt Waldvogel [ 19/Sep/11 ]

I can still reproduce the problem with mongodb 2.0.0.

I've written a small Java program that lets me (and hopefully you) easily reproduce the problem:
https://gist.github.com/1226902#file_multiple_id_query_test.java
The output looks for me like:
https://gist.github.com/1226902#file_output.log
And the collection stats:
https://gist.github.com/1226902#file_collection_stats.js

Comment by Eliot Horowitz (Inactive) [ 17/Sep/11 ]

After the repair - please let me know if you see it again.

Comment by Jan Sparud [ 07/Sep/11 ]

It's being prepared for production use. I'll see if we can do a repair on it.

Comment by Eliot Horowitz (Inactive) [ 06/Sep/11 ]

Is this a production db?
If you could run a repair on it would be very helpful to know its a data issue or something else

Comment by Jan Sparud [ 06/Sep/11 ]

We are using journaling.

Comment by Jan Sparud [ 06/Sep/11 ]

Thanks for the quick response.

Yes, running the same queries multiple times lead to the same results. Note that I get MORE results when asking for 9900 entries than when asking for 10000 entries.

> db.instance.find({_id: {$gt: 2000120855}}).sort({_id:1}).limit(10000).itcount()
9818
> db.instance.find({_id: {$gt: 2000120855}}).sort({_id:1}).limit(10000).itcount()
9818
> db.instance.find({_id: {$gt: 2000120855}}).sort({_id:1}).limit(9900).itcount()
9900
> db.instance.find({_id: {$gt: 2000120855}}).sort({_id:1}).limit(9900).itcount()
9900

I think we are using journaling, will check. We haven't tried upgrading to 1.8.3 yet, but will do that.

Comment by Eliot Horowitz (Inactive) [ 06/Sep/11 ]

Its impossible to say without more information.
Can you run the same query multiple times?
Can you try 1.8.3?
Also, are you running with journalling, and if not was there an unclean shutdown?

Comment by Jan Sparud [ 06/Sep/11 ]

I have the same issue. The database has about 2 billion instances with higher _id than 2000120855.

> db.instance.find({_id: {$gt: 2000120855}}).limit(10000).itcount()
6209
> db.instance.find({_id: {$gt: 2000120855}}).sort({_id:1}).limit(10000).itcount()
9818
> db.version()
1.8.1

This results are consistent, i.e. the queries return the same results if run repeatedly.

Is this fixed between 1.8.1 and 1.8.3?

Comment by Eliot Horowitz (Inactive) [ 02/Sep/11 ]

There was an issue with 1.4.3 that might be related.
Can you try 1.8.3?

Comment by Benedikt Waldvogel [ 17/Mar/11 ]

I've copy/pasted the log of mongos -v here: https://gist.github.com/874740
I see nothing special.

Actually I've used the Java driver to count the number of items since the shell becomes very slow with such big queries.

collection.find(query-with-4000-ids).limit(4000).size();
> returns 4000

int count=0;
for (DBObject o : collection.find(query-with-4000-ids)) {
count++;
}
> count is 4000.

int count=0;
for (DBObject o : collection.find(query-with-4000-ids).limit(4000)) {
count++;
}
> count is 3956.

The query looks like:

{ "_id" :

{ "$in" : [ 554795360, 554795363, ... ] }

}

I double checked that the list of IDs doesn't contain duplicates.

Comment by Eliot Horowitz (Inactive) [ 17/Mar/11 ]

Can you increase verbosity on the mongos and try again?
And send the full query?

Also try:

db.coll.find( query-with-10000-ids ).limit(10000).itcount()

Comment by Benedikt Waldvogel [ 17/Mar/11 ]

still fails on 1.8.0. the logfiles contain nothing.

Comment by Benedikt Waldvogel [ 17/Mar/11 ]

I'm able to reproduce this problem on mongo 1.6.5 and mongo-java-driver 2.4/2.5:

My setup has two shards.

> db.coll.find( query-with-10000-ids ).length() returns 10000
> db.coll.find( query-with-10000-ids ).limit(10000) returns ~5000
> db.coll.find( query-with-5000-ids ).limit(5000) returns ~2500

There's also a thread in the mongo user group that discussed the same issue: http://groups.google.com/group/mongodb-user/browse_thread/thread/c80f62b62650eb1a
I'll try it on 1.8.0 in a few minutes after the upgrade is done.

Comment by Eliot Horowitz (Inactive) [ 17/Feb/11 ]

Can you try with 1.7.6? A number of fixed.

Comment by Eliot Horowitz (Inactive) [ 26/Sep/10 ]

Can you send db.printShardingStatus()

Comment by Mike Richmond [ 21/Sep/10 ]

I'm seeing similar behavior in version:
db version v1.7.1-pre-, pdfile version 4.5
Mon Sep 20 18:14:39 git version: 8d53011001891a36d5f7abf6b5c2117bda5be889

The following is from a mongo console connected to a mongos process and sharded database. Also, there is an index on the "date" field.

> var query = { date:

{ '$gte': 1284793200000, '$lte': 1284793300000 }

}
> db.log.count( query )
2501
> db.log.find( query ).length()
2473
> db.log.find( query ).limit(5000).length()
2344

Generated at Thu Feb 08 02:56:36 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.