|
Glad this worked out! I'm going to mark this ticket as resolved, and please let us know if you encounter more issues.
|
|
Hi Sam,
creating index by
{company_id: 1, ct: -1}
and enforcing (by hint) mongo to use it solves our issue for now. Btw, querying/sorting is working pretty fast regardless absence of "ci" in index (there are 79049215092 documents in database).
|
|
The reason that mongodb is forced to do a scanAndOrder is because you are performing essentially a range query on the "ci" field, which is the first field in the index. The returned documents for each individual ObjectId in your "ci" $in query will be sorted by the "ct" field (since you have a compound index), without having to do an in-memory sort. However, since there are four of these groups of documents (one for each value in the "$in" array), which can span any range of values, the only way to sort the union of the four returned lists is to do an in-memory sort (hence the scanAndOrder:true).
There are a few ways to go about solving this, and they depend to some degree on your data size and application. Changing around your indexes could help - for instance, by adding a
{ company_id : 1, ct : -1 }
index and forcing its use, but this could be too slow if you have a lot of data (its performance will also be dependent on what your shard key is). Another issue with this approach is it will increase your nscanned value.
One of our developers wrote an excellent blog post that actually discusses a very similar situation to yours. I would recommend reading http://emptysquare.net/blog/optimizing-mongodb-compound-indexes/ and seeing if it helps answer your question. Please let us know if there is more help we can give.
|
|
Yes, you're correct. If specify sort as .sort(
{ci: -1}
) explain will be pretty the same (with "scanAndOrder" : true). Unfortunately, no magic here 
mongos> db.messages.find(qry, fields=["_id", "ti", "is_starred"]).sort({ct: -1}).skip(60000).limit(1100).hint({ci:1, ct: -1}).explain()
|
Wed Nov 21 09:58:27 uncaught exception: error: {
|
"$err" : "could not initialize cursor across all shards because : too much data for sort() with no index. add an index or specify a smaller limit @ bill1shard3/shard3bill1a.ninternal.com:27018,shard3bill1b.ninternal.com:27018",
|
"code" : 14827
|
}
|
So, what kind of conclusion will you give? Looks like, in order to do sort by "ct" I HAVE TO have separate index by "ct".
Thanks
|
|
The syntax using the list is indeed correct in pymongo. However, in the mongo shell, you were using the same list syntax where you should be using a dict. Does it help if you change the command you are running in the mongo shell to use a dict instead of a list?
|
|
Sam, actually I've tried to specify sort in this way (using dict). And I believe that syntax with list is correct:
23:50:08@DIRIGENT-BILL1# ./bin/python /tmp/tt.py
|
Traceback (most recent call last):
|
File "/tmp/tt.py", line 24, in <module>
|
db["messages"].find(sq, fields=["_id", "ti", "is_starred"]).sort({"ct", -1}).skip(40000).limit(1100).hint([("ci", 1), ("ct", -1)]).explain()
|
File "/mnt/nimble/eggs/pymongo-2.1.1-py2.7-linux-x86_64.egg/pymongo/cursor.py", line 443, in sort
|
keys = helpers._index_list(key_or_list, direction)
|
File "/mnt/nimble/eggs/pymongo-2.1.1-py2.7-linux-x86_64.egg/pymongo/helpers.py", line 45, in _index_list
|
raise TypeError("if no direction is specified, "
|
TypeError: if no direction is specified, key_or_list must be an instance of list
|
also, if specify sort as
explain will be the same as above.
|
|
I reproduced your situation, and I believe I have found the problem. My apologies for not catching it earlier.
In your shell example, you are calling sort() using the list [("ct", -1)] as the sort parameter. This is not the correct syntax - if you run the same query using
{ ct : -1 }
as the sort parameter, my guess is you will get the same output as from pymongo. The confusing part is that this is indeed the correct syntax for pymongo.
Let me know if this resolves the issue.
|
|
Using the mongosniff utility (http://www.mongodb.org/display/DOCS/mongosniff), you should be able to see the json that is sent from pymongo to the database - it looks like in your previous output you attached the mongosniff of the return trip, rather than the message from pymongo. Was there any other output when you ran it?
We will try again to reproduce the bug. In the meantime, could you run the query again without the "company_id" field to see if the issue still shows up?
|
|
Hi Sam, here are some information I was able to get.
there is a string in Cursor.__send_message:
response = db.connection._send_message_with_response(message, **kwargs)
|
Information from pdb:
(Pdb) l
|
608 db = self.__collection.database
|
609 kwargs = {"_must_use_master": self.__must_use_master}
|
610 kwargs["read_preference"] = self.__read_preference
|
611 if self.__connection_id is not None:
|
612 kwargs["_connection_to_use"] = self.__connection_id
|
613 -> kwargs.update(self.__kwargs)
|
614
|
615 response = db.connection._send_message_with_response(message,
|
616 **kwargs)
|
617
|
618 if isinstance(response, tuple):
|
(Pdb) message
|
(846930886, "\t\x01\x00\x00\xc6#{2\x00\x00\x00\x00\xd4\x07\x00\x00\x00\x00\x00\x00nimble.messages\x00@\x9c\x00\x00\xb4\xfb\xff\xff\xb7\x00\x00\x00\x03$query\x00l\x00\x00\x00\x03ci\x00K\x00\x00\x00\x04$in\x00A\x00\x00\x00\x070\x00M\xec\x86wY\xe2U\xda\xf9E\xed\x06\x071\x00M\xec\x86wY\xe2U\xda\xf9E\xed\x07\x072\x00M\xec\x86wY\xe2U\xda\xf9E\xed\t\x073\x00M\xec\x86wY\xe2U\xda\xf9E\xed\x08\x00\x00\x07company_id\x00L!\x18\xadT9\x7f'\x1b\x00\x00\x00\x00\x03$orderby\x00\r\x00\x00\x00\x10ct\x00\xff\xff\xff\xff\x00\x08$explain\x00\x01\x03$hint\x00\x15\x00\x00\x00\x10ci\x00\x01\x00\x00\x00\x10ct\x00\xff\xff\xff\xff\x00\x00&\x00\x00\x00\x10_id\x00\x01\x00\x00\x00\x10is_starred\x00\x01\x00\x00\x00\x10ti\x00\x01\x00\x00\x00\x00", 183)
|
(Pdb) kwargs
|
{'_must_use_master': False, 'read_preference': 0}
|
I believe, that message here is some BSON-encoded string.
I was unable to find place in code to track $hint, order of fields, query and print them out. Could you provide some more exact places to look?
Thanks.
|
|
I certainly think checking on the messages that pymongo is sending through the socket is a good idea - could you post those when you have them? Importantly, the $hint field in the query and the order of the actual fields in the query should both be able to affect index usage.
|
|
As idea: I can dive (with pdb) in pymongo internals and look for some details. E.g. some values in Cursor/Connection methods, messages pymongo is sending through socket to mongodb.
If you think it can help, please, provide me any further instructions.
|
|
Sam, unfortunately it does not help
|
In [37]: db["messages"].find(sq, fields=["_id", "ti", "is_starred"]).sort([("ct", -1)]).skip(40000).limit(1100).hint([("ci", 1), ("ct", -1)]).explain()
|
Out[37]:
|
{u'clusteredType': u'ParallelSort',
|
u'millisAvg': 386,
|
u'millisTotal': 773,
|
u'n': 2200,
|
u'nChunkSkips': 0,
|
u'nYields': 19,
|
u'nscanned': 82265,
|
u'nscannedObjects': 82265,
|
u'numQueries': 2,
|
u'numShards': 2,
|
u'shards': {u'bill1shard3/shard3bill1a.ninternal.com:27018,shard3bill1b.ninternal.com:27018': [{u'cursor': u'BtreeCursor ci_1_ct_-1 multi',
|
u'indexBounds': {u'ci': [[ObjectId('4dec867759e255daf945ed06'),
|
ObjectId('4dec867759e255daf945ed06')],
|
[ObjectId('4dec867759e255daf945ed07'),
|
ObjectId('4dec867759e255daf945ed07')],
|
[ObjectId('4dec867759e255daf945ed08'),
|
ObjectId('4dec867759e255daf945ed08')],
|
[ObjectId('4dec867759e255daf945ed09'),
|
ObjectId('4dec867759e255daf945ed09')]],
|
u'ct': [[{u'$maxElement': 1}, {u'$minElement': 1}]]},
|
u'indexOnly': False,
|
u'isMultiKey': False,
|
u'millis': 485,
|
u'n': 1100,
|
u'nChunkSkips': 0,
|
u'nYields': 4,
|
u'nscanned': 59124,
|
u'nscannedObjects': 59124,
|
u'scanAndOrder': True}],
|
u'repl1/db.live.repla.ninternal.com:27018,db.live.replb.ninternal.com:27018': [{u'cursor': u'BtreeCursor ci_1_ct_-1 multi',
|
u'indexBounds': {u'ci': [[ObjectId('4dec867759e255daf945ed06'),
|
ObjectId('4dec867759e255daf945ed06')],
|
[ObjectId('4dec867759e255daf945ed07'),
|
ObjectId('4dec867759e255daf945ed07')],
|
[ObjectId('4dec867759e255daf945ed08'),
|
ObjectId('4dec867759e255daf945ed08')],
|
[ObjectId('4dec867759e255daf945ed09'),
|
ObjectId('4dec867759e255daf945ed09')]],
|
u'ct': [[{u'$maxElement': 1}, {u'$minElement': 1}]]},
|
u'indexOnly': False,
|
u'isMultiKey': False,
|
u'millis': 288,
|
u'n': 1100,
|
u'nChunkSkips': 0,
|
u'nYields': 15,
|
u'nscanned': 23141,
|
u'nscannedObjects': 23141,
|
u'scanAndOrder': True}]}}
|
|
In [38]: sq
|
Out[38]: SON([('ci', {'$in': [ObjectId('4dec867759e255daf945ed06'), ObjectId('4dec867759e255daf945ed07'), ObjectId('4dec867759e255daf945ed09'), ObjectId('4dec867759e255daf945ed08')]}), ('company_id', ObjectId('4c2118ad54397f271b000000'))])
|
|
|
It's possible that since python dictionaries are unordered, the order of the fields in your query could be affecting the index use.
Could you run the pymongo query again, using a SON object (http://api.mongodb.org/python/1.7/api/pymongo/son.html) as the query? Something akin to
query = SON([('ci', {'$in': ['...']}), ('company_id', 123)])
|
should do it.
|
|
Okay, here is explain with completely same query:
In [5]: db["messages"].find({"ci": {"$in": [ObjectId("4dec867759e255daf945ed06"), ObjectId("4dec867759e255daf945ed07"), ObjectId("4dec867759e255daf945ed09"), ObjectId("4dec867759e255daf945ed08")]}, "company_id": ObjectId("4c2118ad54397f271b000000")}, fields=["_id", "ti", "is_starred"]).sort([("ct", -1)]).skip(40000).limit(1100).hint([("ci", 1), ("ct", -1)]).explain()
|
Out[5]:
|
{u'clusteredType': u'ParallelSort',
|
u'millisAvg': 413,
|
u'millisTotal': 826,
|
u'n': 2200,
|
u'nChunkSkips': 0,
|
u'nYields': 31,
|
u'nscanned': 82052,
|
u'nscannedObjects': 82052,
|
u'numQueries': 2,
|
u'numShards': 2,
|
u'shards': {u'bill1shard3/shard3bill1a.ninternal.com:27018,shard3bill1b.ninternal.com:27018': [{u'cursor': u'BtreeCursor ci_1_ct_-1 multi',
|
u'indexBounds': {u'ci': [[ObjectId('4dec867759e255daf945ed06'),
|
ObjectId('4dec867759e255daf945ed06')],
|
[ObjectId('4dec867759e255daf945ed07'),
|
ObjectId('4dec867759e255daf945ed07')],
|
[ObjectId('4dec867759e255daf945ed08'),
|
ObjectId('4dec867759e255daf945ed08')],
|
[ObjectId('4dec867759e255daf945ed09'),
|
ObjectId('4dec867759e255daf945ed09')]],
|
u'ct': [[{u'$maxElement': 1}, {u'$minElement': 1}]]},
|
u'indexOnly': False,
|
u'isMultiKey': False,
|
u'millis': 578,
|
u'n': 1100,
|
u'nChunkSkips': 0,
|
u'nYields': 11,
|
u'nscanned': 58911,
|
u'nscannedObjects': 58911,
|
u'scanAndOrder': True}],
|
u'repl1/db.live.repla.ninternal.com:27018,db.live.replb.ninternal.com:27018': [{u'cursor': u'BtreeCursor ci_1_ct_-1 multi',
|
u'indexBounds': {u'ci': [[ObjectId('4dec867759e255daf945ed06'),
|
ObjectId('4dec867759e255daf945ed06')],
|
[ObjectId('4dec867759e255daf945ed07'),
|
ObjectId('4dec867759e255daf945ed07')],
|
[ObjectId('4dec867759e255daf945ed08'),
|
ObjectId('4dec867759e255daf945ed08')],
|
[ObjectId('4dec867759e255daf945ed09'),
|
ObjectId('4dec867759e255daf945ed09')]],
|
u'ct': [[{u'$maxElement': 1}, {u'$minElement': 1}]]},
|
u'indexOnly': False,
|
u'isMultiKey': False,
|
u'millis': 248,
|
u'n': 1100,
|
u'nChunkSkips': 0,
|
u'nYields': 20,
|
u'nscanned': 23141,
|
u'nscannedObjects': 23141,
|
u'scanAndOrder': True}]}}
|
Also, I ran mongosniff again (during querying in mongos VS querying through pymongo). And, yes, ONE and ONLY ONE difference is:
in explain for request from pymongo.
Mongosniff result for query from mongos:
{
|
clusteredType: "ParallelSort",
|
shards: {
|
bill1shard3 / shard3bill1a.ninternal.com: 27018,
|
shard3bill1b.ninternal.com: 27018: [{
|
cursor: "BtreeCursor ci_1_ct_-1 multi",
|
nscanned: 1100,
|
nscannedObjects: 1100,
|
n: 1100,
|
millis: 8,
|
nYields: 0,
|
nChunkSkips: 0,
|
isMultiKey: false,
|
indexOnly: false,
|
indexBounds: {
|
ci: [
|
[ObjectId('4dec867759e255daf945ed06'), ObjectId('4dec867759e255daf945ed06')],
|
[ObjectId('4dec867759e255daf945ed07'), ObjectId('4dec867759e255daf945ed07')],
|
[ObjectId('4dec867759e255daf945ed08'), ObjectId('4dec867759e255daf945ed08')],
|
[ObjectId('4dec867759e255daf945ed09'), ObjectId('4dec867759e255daf945ed09')]
|
],
|
ct: [
|
[{
|
$maxElement: 1
|
}, {
|
$minElement: 1
|
}]
|
]
|
}
|
}],
|
repl1 / db.live.repla.ninternal.com: 27018,
|
db.live.replb.ninternal.com: 27018: [{
|
cursor: "BtreeCursor ci_1_ct_-1 multi",
|
nscanned: 1100,
|
nscannedObjects: 1100,
|
n: 1100,
|
millis: 5,
|
nYields: 0,
|
nChunkSkips: 0,
|
isMultiKey: false,
|
indexOnly: false,
|
indexBounds: {
|
ci: [
|
[ObjectId('4dec867759e255daf945ed06'), ObjectId('4dec867759e255daf945ed06')],
|
[ObjectId('4dec867759e255daf945ed07'), ObjectId('4dec867759e255daf945ed07')],
|
[ObjectId('4dec867759e255daf945ed08'), ObjectId('4dec867759e255daf945ed08')],
|
[ObjectId('4dec867759e255daf945ed09'), ObjectId('4dec867759e255daf945ed09')]
|
],
|
ct: [
|
[{
|
$maxElement: 1
|
}, {
|
$minElement: 1
|
}]
|
]
|
}
|
}]
|
},
|
n: 2200,
|
nChunkSkips: 0,
|
nYields: 0,
|
nscanned: 2200,
|
nscannedObjects: 2200,
|
millisTotal: 13,
|
millisAvg: 6,
|
numQueries: 2,
|
numShards: 2
|
}
|
Mongosniff result for pymongo:
{
|
clusteredType: "ParallelSort",
|
shards: {
|
bill1shard3 / shard3bill1a.ninternal.com: 27018,
|
shard3bill1b.ninternal.com: 27018: [{
|
cursor: "BtreeCursor ci_1_ct_-1 multi",
|
nscanned: 58911,
|
nscannedObjects: 58911,
|
n: 1100,
|
scanAndOrder: true,
|
millis: 485,
|
nYields: 8,
|
nChunkSkips: 0,
|
isMultiKey: false,
|
indexOnly: false,
|
indexBounds: {
|
ci: [
|
[ObjectId('4dec867759e255daf945ed06'), ObjectId('4dec867759e255daf945ed06')],
|
[ObjectId('4dec867759e255daf945ed07'), ObjectId('4dec867759e255daf945ed07')],
|
[ObjectId('4dec867759e255daf945ed08'), ObjectId('4dec867759e255daf945ed08')],
|
[ObjectId('4dec867759e255daf945ed09'), ObjectId('4dec867759e255daf945ed09')]
|
],
|
ct: [
|
[{
|
$maxElement: 1
|
}, {
|
$minElement: 1
|
}]
|
]
|
}
|
}],
|
repl1 / db.live.repla.ninternal.com: 27018,
|
db.live.replb.ninternal.com: 27018: [{
|
cursor: "BtreeCursor ci_1_ct_-1 multi",
|
nscanned: 23141,
|
nscannedObjects: 23141,
|
n: 1100,
|
scanAndOrder: true,
|
millis: 233,
|
nYields: 16,
|
nChunkSkips: 0,
|
isMultiKey: false,
|
indexOnly: false,
|
indexBounds: {
|
ci: [
|
[ObjectId('4dec867759e255daf945ed06'), ObjectId('4dec867759e255daf945ed06')],
|
[ObjectId('4dec867759e255daf945ed07'), ObjectId('4dec867759e255daf945ed07')],
|
[ObjectId('4dec867759e255daf945ed08'), ObjectId('4dec867759e255daf945ed08')],
|
[ObjectId('4dec867759e255daf945ed09'), ObjectId('4dec867759e255daf945ed09')]
|
],
|
ct: [
|
[{
|
$maxElement: 1
|
}, {
|
$minElement: 1
|
}]
|
]
|
}
|
}]
|
},
|
n: 2200,
|
nChunkSkips: 0,
|
nYields: 24,
|
nscanned: 82052,
|
nscannedObjects: 82052,
|
millisTotal: 718,
|
millisAvg: 359,
|
numQueries: 2,
|
numShards: 2
|
}
|
|
|
This is certainly strange. However, we are not convinced that it is due to pymongo, since as the explain() output shows, the index hints are reaching the shards correctly (the query is certainly using the index). Pymongo is correctly sending the query to the database, and any discrepancies in the outcomes of the query after that point should not be due to pymongo.
I do notice that your pymongo explain() output includes the "scanAndOrder : true" field, indicating that the query was not able to use an index when sorting. This is a little worrying, and we will take a closer look at what might have caused it.
I notice that in the shell query you are limiting the return fields to "_id", "ti", and "is_starred", whereas you are not in the pymongo query. Could you try the query again in pymongo, limiting the return fields to those fields, and skipping 40000 as you are in the shell query (rather than 30000)? Only if the queries are the exact same can we be sure that the discrepancy is due to the difference in drivers.
|
|
I'm passing this most recent output along to a member of the pymongo team - I should be back to you shortly.
|
|
Hi Sam,
We've found new veeery interesting moment.
Here is querying/explain in mongo shell (you can see that hit is perfect - nscanned==nscannedObjects==n)
mongos> qry = {
|
... ... "ci" : {
|
... ... "$in" : [
|
... ... ObjectId("4dec867759e255daf945ed06"),
|
... ... ObjectId("4dec867759e255daf945ed07"),
|
... ... ObjectId("4dec867759e255daf945ed09"),
|
... ... ObjectId("4dec867759e255daf945ed08")
|
... ... ]
|
... ... },
|
... ... "company_id" : ObjectId("4c2118ad54397f271b000000")
|
... ... }
|
{
|
"ci" : {
|
"$in" : [
|
ObjectId("4dec867759e255daf945ed06"),
|
ObjectId("4dec867759e255daf945ed07"),
|
ObjectId("4dec867759e255daf945ed09"),
|
ObjectId("4dec867759e255daf945ed08")
|
]
|
},
|
"company_id" : ObjectId("4c2118ad54397f271b000000")
|
}
|
mongos> db.messages.find(qry, fields=["_id", "ti", "is_starred"]).sort([('ct', -1)]).skip(40000).limit(1100).hint({ci:1, ct: -1}).explain()
|
{
|
"clusteredType" : "ParallelSort",
|
"shards" : {
|
"bill1shard3/shard3bill1a.ninternal.com:27018,shard3bill1b.ninternal.com:27018" : [
|
{
|
"cursor" : "BtreeCursor ci_1_ct_-1 multi",
|
"nscanned" : 1100,
|
"nscannedObjects" : 1100,
|
"n" : 1100,
|
"millis" : 11,
|
"nYields" : 2,
|
"nChunkSkips" : 0,
|
"isMultiKey" : false,
|
"indexOnly" : false,
|
"indexBounds" : {
|
"ci" : [
|
[
|
ObjectId("4dec867759e255daf945ed06"),
|
ObjectId("4dec867759e255daf945ed06")
|
],
|
[
|
ObjectId("4dec867759e255daf945ed07"),
|
ObjectId("4dec867759e255daf945ed07")
|
],
|
[
|
ObjectId("4dec867759e255daf945ed08"),
|
ObjectId("4dec867759e255daf945ed08")
|
],
|
[
|
ObjectId("4dec867759e255daf945ed09"),
|
ObjectId("4dec867759e255daf945ed09")
|
]
|
],
|
"ct" : [
|
[
|
{
|
"$maxElement" : 1
|
},
|
{
|
"$minElement" : 1
|
}
|
]
|
]
|
}
|
}
|
],
|
"repl1/db.live.repla.ninternal.com:27018,db.live.replb.ninternal.com:27018" : [
|
{
|
"cursor" : "BtreeCursor ci_1_ct_-1 multi",
|
"nscanned" : 1100,
|
"nscannedObjects" : 1100,
|
"n" : 1100,
|
"millis" : 6,
|
"nYields" : 0,
|
"nChunkSkips" : 0,
|
"isMultiKey" : false,
|
"indexOnly" : false,
|
"indexBounds" : {
|
"ci" : [
|
[
|
ObjectId("4dec867759e255daf945ed06"),
|
ObjectId("4dec867759e255daf945ed06")
|
],
|
[
|
ObjectId("4dec867759e255daf945ed07"),
|
ObjectId("4dec867759e255daf945ed07")
|
],
|
[
|
ObjectId("4dec867759e255daf945ed08"),
|
ObjectId("4dec867759e255daf945ed08")
|
],
|
[
|
ObjectId("4dec867759e255daf945ed09"),
|
ObjectId("4dec867759e255daf945ed09")
|
]
|
],
|
"ct" : [
|
[
|
{
|
"$maxElement" : 1
|
},
|
{
|
"$minElement" : 1
|
}
|
]
|
]
|
}
|
}
|
]
|
},
|
"n" : 2200,
|
"nChunkSkips" : 0,
|
"nYields" : 2,
|
"nscanned" : 2200,
|
"nscannedObjects" : 2200,
|
"millisTotal" : 17,
|
"millisAvg" : 8,
|
"numQueries" : 2,
|
"numShards" : 2
|
}
|
The same query/explain through pymongo:
db["messages"].find({"ci": {"$in": [ObjectId("4dec867759e255daf945ed06"), ObjectId("4dec867759e255daf945ed07"), ObjectId("4dec867759e255daf945ed09"), ObjectId("4dec867759e255daf945ed08")]}, "company_id": ObjectId("4c2118ad54397f271b000000")}).sort([("ct", -1)]).skip(30000).limit(1100).hint([("ci", 1), ("ct", -1)]).explain()
|
Out[14]:
|
{u'clusteredType': u'ParallelSort',
|
u'millisAvg': 538,
|
u'millisTotal': 1077,
|
u'n': 2200,
|
u'nChunkSkips': 0,
|
u'nYields': 11,
|
u'nscanned': 81871,
|
u'nscannedObjects': 81871,
|
u'numQueries': 2,
|
u'numShards': 2,
|
u'shards': {u'bill1shard3/shard3bill1a.ninternal.com:27018,shard3bill1b.ninternal.com:27018': [{u'cursor': u'BtreeCursor ci_1_ct_-1 multi',
|
u'indexBounds': {u'ci': [[ObjectId('4dec867759e255daf945ed06'),
|
ObjectId('4dec867759e255daf945ed06')],
|
[ObjectId('4dec867759e255daf945ed07'),
|
ObjectId('4dec867759e255daf945ed07')],
|
[ObjectId('4dec867759e255daf945ed08'),
|
ObjectId('4dec867759e255daf945ed08')],
|
[ObjectId('4dec867759e255daf945ed09'),
|
ObjectId('4dec867759e255daf945ed09')]],
|
u'ct': [[{u'$maxElement': 1}, {u'$minElement': 1}]]},
|
u'indexOnly': False,
|
u'isMultiKey': False,
|
u'millis': 856,
|
u'n': 1100,
|
u'nChunkSkips': 0,
|
u'nYields': 3,
|
u'nscanned': 58730,
|
u'nscannedObjects': 58730,
|
u'scanAndOrder': True}],
|
u'repl1/db.live.repla.ninternal.com:27018,db.live.replb.ninternal.com:27018': [{u'cursor': u'BtreeCursor ci_1_ct_-1 multi',
|
u'indexBounds': {u'ci': [[ObjectId('4dec867759e255daf945ed06'),
|
ObjectId('4dec867759e255daf945ed06')],
|
[ObjectId('4dec867759e255daf945ed07'),
|
ObjectId('4dec867759e255daf945ed07')],
|
[ObjectId('4dec867759e255daf945ed08'),
|
ObjectId('4dec867759e255daf945ed08')],
|
[ObjectId('4dec867759e255daf945ed09'),
|
ObjectId('4dec867759e255daf945ed09')]],
|
u'ct': [[{u'$maxElement': 1}, {u'$minElement': 1}]]},
|
u'indexOnly': False,
|
u'isMultiKey': False,
|
u'millis': 221,
|
u'n': 1100,
|
u'nChunkSkips': 0,
|
u'nYields': 8,
|
u'nscanned': 23141,
|
u'nscannedObjects': 23141,
|
u'scanAndOrder': True}]}}
|
So, pymongo is a bad guy?
|
|
Sam, thanks for answer.
Data size is veeery huge and extending index with new field will cost us a lot of RAM, but we'll see.
|
|
The explain() output shows that the index is indeed being used. However, the number of objects that the query is checking ("nscanned" in the output) is far greater than the number returned ("n"). This indicates that your index is inefficient for the query, and may be causing the issues with sort.
I notice that you are including the "company_id" field in the query. Are you able to create an index on all three fields? An index on "ci", "company_id", and "ct" - in that order - should vastly speed up the query and could solve your sort() problem.
Given the size of your data, is taking these steps feasible?
|
|
here is the call with explain():
In [16]: db["messages"].find({"ci": {"$in": [ObjectId("4dec867759e255daf945ed06"), ObjectId("4dec867759e255daf945ed07"), ObjectId("4dec867759e255daf945ed09"), ObjectId("4dec867759e255daf945ed08")]}, "company_id": ObjectId("4c2118ad54397f271b000000")}, timeout=False).hint([("ci", 1), ("ct", -1)]).sort("ct", -1).skip(30001).limit(1000).explain()
|
Out[16]:
|
{u'clusteredType': u'ParallelSort',
|
u'millisAvg': 385,
|
u'millisTotal': 770,
|
u'n': 2000,
|
u'nChunkSkips': 0,
|
u'nYields': 27,
|
u'nscanned': 80464,
|
u'nscannedObjects': 80464,
|
u'numQueries': 2,
|
u'numShards': 2,
|
u'shards': {u'bill1shard3/shard3bill1a.ninternal.com:27018,shard3bill1b.ninternal.com:27018': [{u'cursor': u'BtreeCursor ci_1_ct_-1 multi',
|
u'indexBounds': {u'ci': [[ObjectId('4dec867759e255daf945ed06'),
|
ObjectId('4dec867759e255daf945ed06')],
|
[ObjectId('4dec867759e255daf945ed07'),
|
ObjectId('4dec867759e255daf945ed07')],
|
[ObjectId('4dec867759e255daf945ed08'),
|
ObjectId('4dec867759e255daf945ed08')],
|
[ObjectId('4dec867759e255daf945ed09'),
|
ObjectId('4dec867759e255daf945ed09')]],
|
u'ct': [[{u'$maxElement': 1}, {u'$minElement': 1}]]},
|
u'indexOnly': False,
|
u'isMultiKey': False,
|
u'millis': 492,
|
u'n': 1000,
|
u'nChunkSkips': 0,
|
u'nYields': 12,
|
u'nscanned': 57323,
|
u'nscannedObjects': 57323,
|
u'scanAndOrder': True}],
|
u'repl1/db.live.repla.ninternal.com:27018,db.live.replb.ninternal.com:27018': [{u'cursor': u'BtreeCursor ci_1_ct_-1 multi',
|
u'indexBounds': {u'ci': [[ObjectId('4dec867759e255daf945ed06'),
|
ObjectId('4dec867759e255daf945ed06')],
|
[ObjectId('4dec867759e255daf945ed07'),
|
ObjectId('4dec867759e255daf945ed07')],
|
[ObjectId('4dec867759e255daf945ed08'),
|
ObjectId('4dec867759e255daf945ed08')],
|
[ObjectId('4dec867759e255daf945ed09'),
|
ObjectId('4dec867759e255daf945ed09')]],
|
u'ct': [[{u'$maxElement': 1}, {u'$minElement': 1}]]},
|
u'indexOnly': False,
|
u'isMultiKey': False,
|
u'millis': 278,
|
u'n': 1000,
|
u'nChunkSkips': 0,
|
u'nYields': 15,
|
u'nscanned': 23141,
|
u'nscannedObjects': 23141,
|
u'scanAndOrder': True}]}}
|
here mongosniff info:
127.0.0.1:27017 <<-- 127.0.0.1:39688 1223 bytes id:12a6dc82 312925314 - 2044897763
|
reply n:1 cursorId: 0
|
{
|
clusteredType: "ParallelSort",
|
shards: {
|
bill1shard3 / shard3bill1a.ninternal.com: 27018,
|
shard3bill1b.ninternal.com: 27018: [{
|
cursor: "BtreeCursor ci_1_ct_-1 multi",
|
nscanned: 57323,
|
nscannedObjects: 57323,
|
n: 1000,
|
scanAndOrder: true,
|
millis: 489,
|
nYields: 5,
|
nChunkSkips: 0,
|
isMultiKey: false,
|
indexOnly: false,
|
indexBounds: {
|
ci: [
|
[ObjectId('4dec867759e255daf945ed06'), ObjectId('4dec867759e255daf945ed06')],
|
[ObjectId('4dec867759e255daf945ed07'), ObjectId('4dec867759e255daf945ed07')],
|
[ObjectId('4dec867759e255daf945ed08'), ObjectId('4dec867759e255daf945ed08')],
|
[ObjectId('4dec867759e255daf945ed09'), ObjectId('4dec867759e255daf945ed09')]
|
],
|
ct: [
|
[{
|
$maxElement: 1
|
}, {
|
$minElement: 1
|
}]
|
]
|
}
|
}],
|
repl1 / db.live.repla.ninternal.com: 27018,
|
db.live.replb.ninternal.com: 27018: [{
|
cursor: "BtreeCursor ci_1_ct_-1 multi",
|
nscanned: 23141,
|
nscannedObjects: 23141,
|
n: 1000,
|
scanAndOrder: true,
|
millis: 233,
|
nYields: 16,
|
nChunkSkips: 0,
|
isMultiKey: false,
|
indexOnly: false,
|
indexBounds: {
|
ci: [
|
[ObjectId('4dec867759e255daf945ed06'), ObjectId('4dec867759e255daf945ed06')],
|
[ObjectId('4dec867759e255daf945ed07'), ObjectId('4dec867759e255daf945ed07')],
|
[ObjectId('4dec867759e255daf945ed08'), ObjectId('4dec867759e255daf945ed08')],
|
[ObjectId('4dec867759e255daf945ed09'), ObjectId('4dec867759e255daf945ed09')]
|
],
|
ct: [
|
[{
|
$maxElement: 1
|
}, {
|
$minElement: 1
|
}]
|
]
|
}
|
}]
|
},
|
n: 2000,
|
nChunkSkips: 0,
|
nYields: 21,
|
nscanned: 80464,
|
nscannedObjects: 80464,
|
millisTotal: 722,
|
millisAvg: 361,
|
numQueries: 2,
|
numShards: 2
|
}
|
|
|
Additionally, could you post the output of running mongosniff on the mongos's network interface while you run the query? This will show if the $hint field in the query is correctly set.
|
|
Absolutely, thank you for your patience. I was unable to reproduce your bug, but I forwarded this ticket to a member of the pymongo team and he should be getting back to me shortly with a response.
Could you post the output of the same query (possibly with a smaller limit), with an explain() tacked on to the end? This will give us information about index usage.
Here is an example of explain() usage: http://api.mongodb.org/python/2.0/tutorial.html#indexing
|
|
Sam, any updates on this issue?
Thanks.
|
|
Sam, the same issue with pymongo2.3:
In [9]: pymongo.version
|
Out[9]: '2.3'
|
|
In [10]: c = db["messages"].find({"ci": {"$in": [ObjectId("4dec867759e255daf945ed06"), ObjectId("4dec867759e255daf945ed07"), ObjectId("4dec867759e255daf945ed09"), ObjectId("4dec867759e255daf945ed08")]}, "company_id": ObjectId("4c2118ad54397f271b000000")}).hint([("ci", 1), ("ct", -1)])
|
|
In [11]: c.skip(60000).limit(1100).sort([("ct", -1)]).next()
|
---------------------------------------------------------------------------
|
OperationFailure Traceback (most recent call last)
|
/mnt/nimble/<ipython-input-11-a5cdb8f9f8cc> in <module>()
|
----> 1 c.skip(60000).limit(1100).sort([("ct", -1)]).next()
|
|
/mnt/nimble/eggs/pymongo-2.3-py2.7-linux-x86_64.egg/pymongo/cursor.pyc in next(self)
|
776 raise StopIteration
|
777 db = self.__collection.database
|
--> 778 if len(self.__data) or self._refresh():
|
779 if self.__manipulate:
|
780 return db._fix_outgoing(self.__data.popleft(),
|
|
/mnt/nimble/eggs/pymongo-2.3-py2.7-linux-x86_64.egg/pymongo/cursor.pyc in _refresh(self)
|
727 self.__skip, ntoreturn,
|
728 self.__query_spec(), self.__fields,
|
--> 729 self.__uuid_subtype))
|
730 if not self.__id:
|
731 self.__killed = True
|
|
/mnt/nimble/eggs/pymongo-2.3-py2.7-linux-x86_64.egg/pymongo/cursor.pyc in __send_message(self, message)
|
684 self.__as_class,
|
685 self.__tz_aware,
|
--> 686 self.__uuid_subtype)
|
687 except AutoReconnect:
|
688 # Don't send kill cursors to another server after a "not master"
|
|
|
/mnt/nimble/eggs/pymongo-2.3-py2.7-linux-x86_64.egg/pymongo/helpers.pyc in _unpack_response(response, cursor_id, as_class, tz_aware, uuid_subtype)
|
102 raise AutoReconnect("master has changed")
|
103 raise OperationFailure("database error: %s" %
|
--> 104 error_object["$err"])
|
105
|
106 result = {}
|
|
OperationFailure: database error: too much data for sort() with no index. add an index or specify a smaller limit
|
|
In [12]: c.skip(50000).limit(1100).sort([("ct", -1)]).next()
|
---------------------------------------------------------------------------
|
OperationFailure Traceback (most recent call last)
|
/mnt/nimble/<ipython-input-12-7c86df48473a> in <module>()
|
----> 1 c.skip(50000).limit(1100).sort([("ct", -1)]).next()
|
|
/mnt/nimble/eggs/pymongo-2.3-py2.7-linux-x86_64.egg/pymongo/cursor.pyc in next(self)
|
776 raise StopIteration
|
777 db = self.__collection.database
|
--> 778 if len(self.__data) or self._refresh():
|
779 if self.__manipulate:
|
780 return db._fix_outgoing(self.__data.popleft(),
|
|
/mnt/nimble/eggs/pymongo-2.3-py2.7-linux-x86_64.egg/pymongo/cursor.pyc in _refresh(self)
|
727 self.__skip, ntoreturn,
|
728 self.__query_spec(), self.__fields,
|
--> 729 self.__uuid_subtype))
|
730 if not self.__id:
|
731 self.__killed = True
|
|
/mnt/nimble/eggs/pymongo-2.3-py2.7-linux-x86_64.egg/pymongo/cursor.pyc in __send_message(self, message)
|
684 self.__as_class,
|
685 self.__tz_aware,
|
--> 686 self.__uuid_subtype)
|
687 except AutoReconnect:
|
688 # Don't send kill cursors to another server after a "not master"
|
|
|
/mnt/nimble/eggs/pymongo-2.3-py2.7-linux-x86_64.egg/pymongo/helpers.pyc in _unpack_response(response, cursor_id, as_class, tz_aware, uuid_subtype)
|
102 raise AutoReconnect("master has changed")
|
103 raise OperationFailure("database error: %s" %
|
--> 104 error_object["$err"])
|
105
|
106 result = {}
|
|
OperationFailure: database error: too much data for sort() with no index. add an index or specify a smaller limit
|
|
In [13]: c.skip(40000).limit(1100).sort([("ct", -1)]).next()
|
Out[13]:
|
{u'_id': ObjectId('5f007ddaca8bfd35c341b45b'),
|
u'attachments': [],
|
u'body': u'0000990b70cb317608d0c7fab58d043850d1',
|
u'cc_reprs': [],
|
u'ci': ObjectId('4dec867759e255daf945ed08'),
|
u'company_id': ObjectId('4c2118ad54397f271b000000'),
|
u'conversation_id': ObjectId('4f6782f3fcf7b752aa000007'),
|
u'credential_index': 3,
|
u'ct': 1332186495,
|
u'hs': [5049581948831632428L, 5195809348592625697L],
|
u'is_deleted': False,
|
u'is_read': True,
|
u'is_starred': False,
|
u'nimble_score': 0,
|
u'other_flags': 0,
|
u'owner_id': ObjectId('4c21188654397f270f000001'),
|
u'receiver_reprs': [[u'Richard Young', u'richard@nimble.com']],
|
u'sender_repr': [u'todd@nimble.com', u'todd@nimble.com'],
|
u'should_be_synced': False,
|
u'source': 4,
|
u'subject_id': ObjectId('4f6782f4fcf7b752aa00001f'),
|
u'summary_id': ObjectId('4f6782f4fcf7b752aa00001c'),
|
u'ti': u'9f4112b88bc4aa92a7c33c4c64367f1b',
|
u'tstamp': 1332183499}
|
|
In [14]: c.count()
|
Out[14]: 78858
|
|
In [15]: db["messages"].index_information()
|
Out[15]:
|
{u'_id_': {u'key': [(u'_id', 1)], u'v': 1},
|
u'_id_1_is_deleted_1_source_1': {u'background': True,
|
u'key': [(u'_id', 1), (u'is_deleted', 1), (u'source', 1)],
|
u'v': 1},
|
u'ci_1_ct_-1': {u'background': True,
|
u'key': [(u'ci', 1.0), (u'ct', -1.0)],
|
u'v': 1},
|
u'company_id_1_conversation_id_1': {u'background': True,
|
u'key': [(u'company_id', 1.0), (u'conversation_id', 1.0)],
|
u'v': 1},
|
u'company_id_1_hs_1': {u'background': True,
|
u'key': [(u'company_id', 1.0), (u'hs', 1.0)],
|
u'v': 1},
|
u'company_id_1_source_1_ct_-1': {u'background': True,
|
u'key': [(u'company_id', 1.0), (u'source', 1.0), (u'ct', -1.0)],
|
u'v': 1},
|
u'company_id_1_ti_1': {u'background': True,
|
u'key': [(u'company_id', 1.0), (u'ti', 1.0)],
|
u'v': 1},
|
u'owner_id_1_company_id_1_is_deleted_1_conversation_id_1': {u'background': True,
|
u'key': [(u'owner_id', 1),
|
(u'company_id', 1),
|
(u'is_deleted', 1),
|
(u'conversation_id', 1)],
|
u'v': 1},
|
u'owner_id_1_company_id_1_is_deleted_1_credential_index_1_source_1_tstamp_-1': {u'key': [(u'owner_id',
|
1),
|
(u'company_id', 1),
|
(u'is_deleted', 1),
|
(u'credential_index', 1),
|
(u'source', 1),
|
(u'tstamp', -1)],
|
u'v': 1},
|
u'owner_id_1_company_id_1_is_deleted_1_tstamp_-1': {u'key': [(u'owner_id', 1),
|
(u'company_id', 1),
|
(u'is_deleted', 1),
|
(u'tstamp', -1)],
|
u'v': 1}}
|
|
In [16]:
|
|
|
Sam, I'm still interesting in this issue Too busy with current work, I'll provide results with pymongo 2.3 later today or tomorrow.
|
|
I'm going to forward your question to one of the members of our pymongo team so that they can determine the best approach to take with this ticket. I have one last question - does the problem persist if you use pymongo 2.3 instead of 2.1?
|
|
I'll leave here some explanations regarding pymongo:
In [56]: c = db["messages"].find({"ci": {"$in": [ObjectId("4dec867759e255daf945ed06"), ObjectId("4dec867759e255daf945ed07"), ObjectId("4dec867759e255daf945ed09"), ObjectId("4dec867759e255daf945ed08")]}, "company_id": ObjectId("4c2118ad54397f271b000000")}).hint([("ci", 1), ("ct", -1)])
|
|
In [57]: c.skip(60000).limit(1100).sort([("ct", -1)]).next()---------------------------------------------------------------------------
|
OperationFailure Traceback (most recent call last)
|
/mnt/nimble/<ipython-input-57-a5cdb8f9f8cc> in <module>()
|
----> 1 c.skip(60000).limit(1100).sort([("ct", -1)]).next()
|
|
/mnt/nimble/eggs/pymongo-2.1.1-py2.7-linux-x86_64.egg/pymongo/cursor.pyc in next(self)
|
701 raise StopIteration
|
702 db = self.__collection.database
|
--> 703 if len(self.__data) or self._refresh():
|
704 if self.__manipulate:
|
705 return db._fix_outgoing(self.__data.pop(0), self.__collection)
|
|
/mnt/nimble/eggs/pymongo-2.1.1-py2.7-linux-x86_64.egg/pymongo/cursor.pyc in _refresh(self)
|
664 self.__skip, ntoreturn,
|
665 self.__query_spec(), self.__fields,
|
--> 666 self.__uuid_subtype))
|
667 if not self.__id:
|
668 self.__killed = True
|
|
/mnt/nimble/eggs/pymongo-2.1.1-py2.7-linux-x86_64.egg/pymongo/cursor.pyc in __send_message(self, message)
|
626 response = helpers._unpack_response(response, self.__id,
|
627 self.__as_class,
|
--> 628 self.__tz_aware)
|
629 except AutoReconnect:
|
630 db.connection.disconnect()
|
|
/mnt/nimble/eggs/pymongo-2.1.1-py2.7-linux-x86_64.egg/pymongo/helpers.pyc in _unpack_response(response, cursor_id, as_class, tz_aware)
|
99 raise AutoReconnect("master has changed")
|
100 raise OperationFailure("database error: %s" %
|
--> 101 error_object["$err"])
|
102
|
103 result = {}
|
|
OperationFailure: database error: too much data for sort() with no index. add an index or specify a smaller limit
|
|
In [58]: c.skip(50000).limit(1100).sort([("ct", -1)]).next()
|
---------------------------------------------------------------------------
|
OperationFailure Traceback (most recent call last)
|
/mnt/nimble/<ipython-input-58-7c86df48473a> in <module>()
|
----> 1 c.skip(50000).limit(1100).sort([("ct", -1)]).next()
|
|
/mnt/nimble/eggs/pymongo-2.1.1-py2.7-linux-x86_64.egg/pymongo/cursor.pyc in next(self)
|
701 raise StopIteration
|
702 db = self.__collection.database
|
--> 703 if len(self.__data) or self._refresh():
|
704 if self.__manipulate:
|
705 return db._fix_outgoing(self.__data.pop(0), self.__collection)
|
|
/mnt/nimble/eggs/pymongo-2.1.1-py2.7-linux-x86_64.egg/pymongo/cursor.pyc in _refresh(self)
|
664 self.__skip, ntoreturn,
|
665 self.__query_spec(), self.__fields,
|
--> 666 self.__uuid_subtype))
|
667 if not self.__id:
|
668 self.__killed = True
|
|
/mnt/nimble/eggs/pymongo-2.1.1-py2.7-linux-x86_64.egg/pymongo/cursor.pyc in __send_message(self, message)
|
626 response = helpers._unpack_response(response, self.__id,
|
627 self.__as_class,
|
--> 628 self.__tz_aware)
|
629 except AutoReconnect:
|
630 db.connection.disconnect()
|
|
/mnt/nimble/eggs/pymongo-2.1.1-py2.7-linux-x86_64.egg/pymongo/helpers.pyc in _unpack_response(response, cursor_id, as_class, tz_aware)
|
99 raise AutoReconnect("master has changed")
|
100 raise OperationFailure("database error: %s" %
|
--> 101 error_object["$err"])
|
102
|
103 result = {}
|
|
OperationFailure: database error: too much data for sort() with no index. add an index or specify a smaller limit
|
|
In [59]: c.skip(40000).limit(1100).sort([("ct", -1)]).next()
|
Out[59]:
|
{u'_id': ObjectId('6ba8554d212f82dcdb0e9206'),
|
u'attachments': [],
|
u'body': u'0000bd6aef2d2f275e7ed039da78707ff8f6',
|
u'cc_reprs': [],
|
u'ci': ObjectId('4dec867759e255daf945ed08'),
|
u'company_id': ObjectId('4c2118ad54397f271b000000'),
|
u'conversation_id': ObjectId('4f677fa0fcf7b739d5000028'),
|
u'credential_index': 3,
|
u'ct': 1332186497,
|
u'hs': [6758928705901825592L, 5049581948831632428L],
|
u'is_deleted': False,
|
u'is_read': True,
|
u'is_starred': False,
|
u'nimble_score': 0,
|
u'other_flags': 0,
|
u'owner_id': ObjectId('4c21188654397f270f000001'),
|
u'receiver_reprs': [[u'todd@nimble.com', u'todd@nimble.com']],
|
u'sender_repr': [u'CSH Consulting, Inc.', u'gary@ecsh.net'],
|
u'should_be_synced': False,
|
u'source': 4,
|
u'subject_id': ObjectId('4ee7a7e93bc9d05af1000018'),
|
u'summary_id': ObjectId('4f677fa0fcf7b739d5000029'),
|
u'ti': u'c2a1bc6c3e496624901478220b63e0c7',
|
u'tstamp': 1332182621}
|
|
In [60]: c.count()
|
|
Out[60]: 78860
|
|
In [61]:
|
|
In [61]: db["messages"].index_information()
|
Out[61]:
|
{u'_id_': {u'key': [(u'_id', 1)], u'v': 1},
|
u'_id_1_is_deleted_1_source_1': {u'background': True,
|
u'key': [(u'_id', 1), (u'is_deleted', 1), (u'source', 1)],
|
u'v': 1},
|
u'ci_1_ct_-1': {u'background': True,
|
u'key': [(u'ci', 1.0), (u'ct', -1.0)],
|
u'v': 1},
|
u'company_id_1_conversation_id_1': {u'background': True,
|
u'key': [(u'company_id', 1.0), (u'conversation_id', 1.0)],
|
u'v': 1},
|
u'company_id_1_hs_1': {u'background': True,
|
u'key': [(u'company_id', 1.0), (u'hs', 1.0)],
|
u'v': 1},
|
u'company_id_1_source_1_ct_-1': {u'background': True,
|
u'key': [(u'company_id', 1.0), (u'source', 1.0), (u'ct', -1.0)],
|
u'v': 1},
|
u'company_id_1_ti_1': {u'background': True,
|
u'key': [(u'company_id', 1.0), (u'ti', 1.0)],
|
u'v': 1},
|
u'owner_id_1_company_id_1_is_deleted_1_conversation_id_1': {u'background': True,
|
u'key': [(u'owner_id', 1),
|
(u'company_id', 1),
|
(u'is_deleted', 1),
|
(u'conversation_id', 1)],
|
u'v': 1},
|
u'owner_id_1_company_id_1_is_deleted_1_credential_index_1_source_1_tstamp_-1': {u'key': [(u'owner_id',
|
1),
|
(u'company_id', 1),
|
(u'is_deleted', 1),
|
(u'credential_index', 1),
|
(u'source', 1),
|
(u'tstamp', -1)],
|
u'v': 1},
|
u'owner_id_1_company_id_1_is_deleted_1_tstamp_-1': {u'key': [(u'owner_id', 1),
|
(u'company_id', 1),
|
(u'is_deleted', 1),
|
(u'tstamp', -1)],
|
u'v': 1}}
|
So, as you can see - after reaching some level of `skip` mongo (through pymongo) raises databases error.
Here is the same (almost) query, but executed in mongos shell:
mongos> qry
|
{
|
"ci" : {
|
"$in" : [
|
ObjectId("4dec867759e255daf945ed06"),
|
ObjectId("4dec867759e255daf945ed07"),
|
ObjectId("4dec867759e255daf945ed09"),
|
ObjectId("4dec867759e255daf945ed08")
|
]
|
},
|
"company_id" : ObjectId("4c2118ad54397f271b000000")
|
}
|
mongos> c = db.messages.find(qry, fields=["_id", "ti", "is_starred"]).sort([('ct', -1)]).skip(60000).limit(1100).hint({ci:1, ct: -1})
|
{ "_id" : ObjectId("edbfe3bf54be2930aa7414f9") }
|
{ "_id" : ObjectId("754ee11ca436c3a303f41498") }
|
{ "_id" : ObjectId("9eafcd3451ad606a820472e9") }
|
{ "_id" : ObjectId("1e1c8bb4744b597a05db5c52") }
|
{ "_id" : ObjectId("6dc925a5152a0a58de5c5437") }
|
{ "_id" : ObjectId("27d1517c57f6353711a26fa8") }
|
{ "_id" : ObjectId("3cb7211fe0013f24a3e8cea3") }
|
{ "_id" : ObjectId("01f956c3c8ceac2d8cfabd69") }
|
{ "_id" : ObjectId("7d858b400bd1e7bd1c62bc27") }
|
{ "_id" : ObjectId("1f3525b5241ff2ad90e792a9") }
|
{ "_id" : ObjectId("11dcd18b3e0873a5ea7d609e") }
|
{ "_id" : ObjectId("146d6909fce2a822543ebce2") }
|
{ "_id" : ObjectId("1d2913d26cbc86731e85ec2d") }
|
{ "_id" : ObjectId("210519487261a0f364dd0ce5") }
|
{ "_id" : ObjectId("24296514b10917cc63a7c20e") }
|
{ "_id" : ObjectId("258696caba933ec9307210ac") }
|
{ "_id" : ObjectId("277cdf94b7f3dd5df107dcd7") }
|
{ "_id" : ObjectId("2a692d4667ee22a73c116d9d") }
|
{ "_id" : ObjectId("2bf00c1076b331a0135bf507") }
|
{ "_id" : ObjectId("2eb778019e3ffc4810885429") }
|
has more
|
mongos>
|
|
|
Hi Sam,
looks like, this is the issue related to pymongo. the same query (with sort, skip, limit) executed directly from mongo shell works correctly.
Should I change type of this ticket or create new? (and close this one)
|
|
Does an explain() on the query show that the index is being used? The docs indicate that your syntax may be incorrect for the "hint" call.
|
|
Sam, there is an compound index on collection:
{
"v" : 1,
"key" :
{ "ci" : 1, "ct" : -1 }
,
"ns" : "nimble.messages",
"name" : "ci_1_ct_-1",
"background" : true
}
not only by "ci"
and in my query, I'm doing clear hint by this index:
hint([("ci", 1), ("ct", -1)]))
Because of this, I don't see any obstacles, why mongo is unable to do sort by that index. Limit 5020 is coming from application. It can be shrunk just to check if it will help. But I don't think that it's value is too big, I'd say that this limit is very small in compare with other queries.
|
|
The sort() clause of a query is unable to use a different index than the actual query did when it does the sorting. In your case, the query is using the index on "ci" for the query, and then cannot use an index to sort based on the "ct" field. Where is the 5020 limit coming from? Can it be shrunk?
|
|
Hi Sam,
1. stats() for collection
mongos> db.messages.stats()
|
{
|
"sharded" : true,
|
"flags" : 1,
|
"ns" : "nimble.messages",
|
"count" : 105978751,
|
"numExtents" : 161,
|
"size" : 78016705056,
|
"storageSize" : 91925388976,
|
"totalIndexSize" : 38120469184,
|
"indexSizes" : {
|
"_id_" : 5618874240,
|
"ci_1_ct_-1" : 4971882832,
|
"company_id_1_conversation_id_1" : 5224488528,
|
"company_id_1_hs_1" : 14596342992,
|
"company_id_1_ti_1" : 7708880592
|
},
|
"avgObjSize" : 736.1542226139276,
|
"nindexes" : 5,
|
"nchunks" : 2155,
|
"shards" : {
|
"bill1shard1" : {
|
"ns" : "nimble.messages",
|
"count" : 26963750,
|
"size" : 19786179752,
|
"avgObjSize" : 733.8066757034908,
|
"storageSize" : 23136948144,
|
"numExtents" : 40,
|
"nindexes" : 5,
|
"lastExtentSize" : 2146426864,
|
"paddingFactor" : 1.5999999998514498,
|
"flags" : 1,
|
"totalIndexSize" : 9723782208,
|
"indexSizes" : {
|
"_id_" : 1415592640,
|
"company_id_1_conversation_id_1" : 1343913648,
|
"company_id_1_ti_1" : 1970595872,
|
"ci_1_ct_-1" : 1281931392,
|
"company_id_1_hs_1" : 3711748656
|
},
|
"ok" : 1
|
},
|
"bill1shard2" : {
|
"ns" : "nimble.messages",
|
"count" : 26839712,
|
"size" : 20306007752,
|
"avgObjSize" : 756.5657840143739,
|
"storageSize" : 24242671520,
|
"numExtents" : 40,
|
"nindexes" : 5,
|
"lastExtentSize" : 2146426864,
|
"paddingFactor" : 1.0099999998114662,
|
"flags" : 1,
|
"totalIndexSize" : 9868137664,
|
"indexSizes" : {
|
"_id_" : 1383657184,
|
"company_id_1_conversation_id_1" : 1371205136,
|
"company_id_1_ti_1" : 1985190032,
|
"ci_1_ct_-1" : 1292249504,
|
"company_id_1_hs_1" : 3835835808
|
},
|
"ok" : 1
|
},
|
"bill1shard3" : {
|
"ns" : "nimble.messages",
|
"count" : 26307389,
|
"size" : 19215313392,
|
"avgObjSize" : 730.4150705339857,
|
"storageSize" : 21895733168,
|
"numExtents" : 40,
|
"nindexes" : 5,
|
"lastExtentSize" : 2146426864,
|
"paddingFactor" : 1.3899999998946373,
|
"flags" : 1,
|
"totalIndexSize" : 9409096144,
|
"indexSizes" : {
|
"_id_" : 1424954160,
|
"company_id_1_conversation_id_1" : 1266691328,
|
"company_id_1_ti_1" : 1899088576,
|
"ci_1_ct_-1" : 1210841072,
|
"company_id_1_hs_1" : 3607521008
|
},
|
"ok" : 1
|
},
|
"repl1" : {
|
"ns" : "nimble.messages",
|
"count" : 25867900,
|
"size" : 18709204160,
|
"avgObjSize" : 723.2594899469999,
|
"storageSize" : 22650036144,
|
"numExtents" : 41,
|
"nindexes" : 5,
|
"lastExtentSize" : 2146426864,
|
"paddingFactor" : 1.1899999998891428,
|
"flags" : 1,
|
"totalIndexSize" : 9119453168,
|
"indexSizes" : {
|
"_id_" : 1394670256,
|
"company_id_1_conversation_id_1" : 1242678416,
|
"company_id_1_ti_1" : 1854006112,
|
"ci_1_ct_-1" : 1186860864,
|
"company_id_1_hs_1" : 3441237520
|
},
|
"ok" : 1
|
}
|
},
|
"ok" : 1
|
}
|
2. getIndexes() on collection:
mongos> db.messages.getIndexes()
|
[
|
{
|
"v" : 1,
|
"key" : {
|
"_id" : 1
|
},
|
"ns" : "nimble.messages",
|
"name" : "_id_"
|
},
|
{
|
"v" : 1,
|
"key" : {
|
"company_id" : 1,
|
"conversation_id" : 1
|
},
|
"ns" : "nimble.messages",
|
"name" : "company_id_1_conversation_id_1",
|
"background" : true
|
},
|
{
|
"v" : 1,
|
"key" : {
|
"company_id" : 1,
|
"ti" : 1
|
},
|
"ns" : "nimble.messages",
|
"name" : "company_id_1_ti_1",
|
"background" : true
|
},
|
{
|
"v" : 1,
|
"key" : {
|
"ci" : 1,
|
"ct" : -1
|
},
|
"ns" : "nimble.messages",
|
"name" : "ci_1_ct_-1",
|
"background" : true
|
},
|
{
|
"v" : 1,
|
"key" : {
|
"company_id" : 1,
|
"hs" : 1
|
},
|
"ns" : "nimble.messages",
|
"name" : "company_id_1_hs_1",
|
"background" : true
|
}
|
]
|
3. document example:
mongos> db.messages.find().next()
|
{
|
"_id" : ObjectId("0df22a1974f652624f9e8d9b"),
|
"attachments" : [ ],
|
"body" : "00004eaa3984a554dbf63da3cc14830671ac",
|
"cc_reprs" : [ ],
|
"ci" : ObjectId("4e689aa5a20c5645d1453749"),
|
"company_id" : ObjectId("4e689a99a697c834e9000055"),
|
"conversation_id" : ObjectId("4f2fe92929aec97b83000260"),
|
"credential_index" : 0,
|
"ct" : 1323915861,
|
"hs" : [
|
NumberLong("-997593486542788066"),
|
NumberLong("-301847444995728855")
|
],
|
"is_deleted" : false,
|
"is_read" : true,
|
"is_starred" : false,
|
"nimble_score" : 0,
|
"other_flags" : 0,
|
"owner_id" : ObjectId("4e689a99a697c834e9000054"),
|
"receiver_reprs" : [
|
[
|
"John Doe",
|
"john@john.doe"
|
]
|
],
|
"sender_repr" : [
|
"Jane Doe",
|
"jane@jane.doe"
|
],
|
"should_be_synced" : false,
|
"source" : 4,
|
"subject_id" : ObjectId("4f2fe92929aec97b83000264"),
|
"summary_id" : ObjectId("4f2fe92929aec97b83000261"),
|
"ti" : "e0ab6b524450a3ca93502c53185dbd5a",
|
"tstamp" : 1323915861
|
}
|
4. query (in some pseudo-python style) looks like:
cursor = db.messages.find({"ci": {"$in": [ObjectId(123..), ObjectId(123...), ....]},
|
"company_id": ObjectId(123....)}, fields=["_id", "ti", "is_starred").
|
hint([("ci", 1), ("ct", -1)]))
|
cursor.sort("ct", -1)
|
cursor.limit(5020)
|
and on attempt to iterate of that sorted cursor we got error described above
|
|
Hello,
Could you post the fields that are in the index being used, as well as the field(s) you are sorting by?
|
Generated at Thu Feb 08 03:14:22 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.