[SERVER-645] tailable cursor with expression involving _id doesn't return rows Created: 17/Feb/10  Updated: 12/Jul/16  Resolved: 10/Mar/10

Status: Closed
Project: Core Server
Component/s: Querying
Affects Version/s: None
Fix Version/s: 1.3.4

Type: Bug Priority: Major - P3
Reporter: Roger Binns Assignee: Michael Dirolf
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

pymongo-1.4-py2.6-linux-x86_64 Ubuntu 9.10


Attachments: File tail.py    
Participants:

 Description   

If you have a tailable cursor and the search spec involves "_id" then new rows are not returned as they should be. The attached code lightly modified from the pymongo test code demonstrates the problem.

If you comment out the line doing $gt on _id then everything works fine. If you include it then what you see is that a row is not returned when it should. assertEqual then redoes the query showing that the new row does indeed match the search spec and it was the tailable cursor failing to return it.



 Comments   
Comment by Michael Dirolf [ 10/Mar/10 ]

Yes, the "alive" property of Cursor instances which Mathias added yesterday should do the trick.

Comment by Aaron Staple [ 09/Mar/10 ]

I implemented the changes I described above and updated the docs. Mike, can you see if checking for a dead cursor is straightforward in python? Thanks.

Comment by auto [ 09/Mar/10 ]

Author:

{'login': 'astaple', 'name': 'Aaron', 'email': 'aaron@10gen.com'}

Message: SERVER-645 tailable cursor on capped collections only, won't use indexes
http://github.com/mongodb/mongo/commit/94479c550fcbce7ae2d9b3275d6d368bad98fba8

Comment by Roger Binns [ 09/Mar/10 ]

"Make behaviour obvious" is exactly the right ting and well worded. Note that I use pymongo so I don't know exactly how that maps onto isDead() - it looks like an 'alive' property does.

Comment by Eliot Horowitz (Inactive) [ 08/Mar/10 ]

I think that's right for now.
Most important at this point is just to make behavior obvious, and not subtle.
Can make it better later as needed.

Comment by Aaron Staple [ 08/Mar/10 ]

So in my opinon we should:

  • not allow tailable cursors on non capped collections
  • disable use of indexes when a tailable cursor is requested

I'm not going to generate an error when the original query returns an empty result - you can use isDead() (check that cursorid is zero) to check if the cursor is alive.

Eliot does this sound right?

Comment by Roger Binns [ 08/Mar/10 ]

Perhaps you could make a tailable cursor with no results at least generate an error. That is probably better than docs, since it isn't an obvious issue.

My use case is doing the equivalent of a tail -f on a log collection. There are two issues. One is that you cannot make a tailable cursor block until there is data, so you have to poll. I recall someone on the mailing list saying this would be implemented at some point.

The second was that the tailable cursor did not give new entries which is what I am reporting here. Note that I do have to give a query involving the _id. For example if the collection already has 1 million docs, I do not want to wade through them all first. What I do is find the 20 highest ids (they are mongodb generated) and have a tailable query for _ids greater than that. You get results once (maybe) but nothing after that. (I am using pymongo if that matters).

It is probably worth updating the doc page to mention the cursor is non-blocking and suggest how to start with a collection with lots of items already. The C++ code example is sort of doing this already (sleep for 1 second, recording the highest id -ish) but C++ code isn't exactly the best documentation

Due to travel I probably won't be able to work on this further until next month.

Comment by Eliot Horowitz (Inactive) [ 06/Mar/10 ]

@roger, jira is getting a "The email account that you tried to reach does not exist" when trying to email you.

Comment by Aaron Staple [ 06/Mar/10 ]

Hi Roger,

As I mentioned above, it is not enough for the collection to be nonempty - the query must return a nonempty initial result in order to seed a tailable cursor. This and the index problem are the only known issues with tailable cursors (and we will certainly add them to the docs, thanks). If you are experiencing another problem, perhaps you can confirm with us and if possible send a test case.

Tailable cursors were originally implemented for mongo to use internally as part of its replication mechanism, and are now supported in user queries as a convenience. The current behavior is a bug if it's not what you want.

Comment by Roger Binns [ 06/Mar/10 ]

It would be nice if JIRA watches worked so I knew this had been updated.

@aaron: The collection I found this on was not capped. That is where I was seeing the problem. ie there was an index on _id. The initial comments were about how this only affected collections with indices so I tested again with a capped collection as it has no indices. In summary the problem is exhibited whether or not there is an index present.

My real world work had many entries already when I found the bug. If there is a requirement for at least one entry then it should at least be mentioned in the doc!

From the rest of your comments I do not understand the point you are trying to make. Are you confirming that there are indeed bugs, or that Mongo will not support tailable cursors on _id by design?

Comment by auto [ 02/Mar/10 ]

Author:

{'login': 'astaple', 'name': 'Aaron', 'email': 'aaron@10gen.com'}

Message: SERVER-645 update test to use capped collection
http://github.com/mongodb/mongo/commit/b3f2b01d678b9c9819737877327ba42bdd0859b4

Comment by Aaron Staple [ 02/Mar/10 ]

Hi Roger,

A couple of things. In your python script the collection isn't capped, therefore it will be automatically be created with an _id index and tailable cursors on _id won't work correctly. Secondly, your initial query is on an empty collection. We don't support creating a tailable cursor from no data - there has to be at least one match to the initial query in order to establish a starting point for the tailable cursor. I'm sure this is something we could change if there was enough interest - just hasn't come up so far.

Comment by Aaron Staple [ 02/Mar/10 ]

So first off, it sounds like in an earlier version of this bug a tailable cursor was being used on a non capped collection. This is not recommended, because insertion into a non capped collection is not guaranteed to be in order (so you may or may not see new documents at the 'tail'). Eliot, should we add a guard and assert if someone tries to use a tailable cursor on a non capped collection?

Comment by Roger Binns [ 19/Feb/10 ]

My current workaround is to use a non-tailable query and remember the highest _id seen from previous query asking for anything higher. I also ask for the log level to be greater than or equal to a value (corresponding to debug, info, warn etc).

Doing a query to get the last n entries completes in about 3 seconds, maybe less. This is perfectly acceptable performance.

Doing the equivalent of the tailable query also gives timely results but results in 20% CPU consumption (I have 8 cores so this isn't a big deal in the short term). This is measured when there were no additions - ie empty result set. Once this query works correctly and tailable cursors can block till new entries arrive, it should become zero.

Comment by Roger Binns [ 19/Feb/10 ]

> db.log.getIndexKeys()
[ ]
> db.log.stats()
{
"ns" : "log.log",
"count" : 341057,
"size" :

{ "top" : 0, "bottom" : 2091694464 }

,
"storageSize" :

{ "top" : 0, "bottom" : 2097152256 }

,
"numExtents" : 1,
"nindexes" : 0,
"lastExtentSize" : 2097152256,
"paddingFactor" : 1,
"flags" : 0,
"capped" : 1,
"max" : 2147483647,
"ok" : 1
}

This is all part of a larger system which wouldn't make any sense to you However I am more than happy to provide test code (using pymongo) if that would be useful. Note that there is an attachment already demonstrating the problem.

Comment by Eliot Horowitz (Inactive) [ 19/Feb/10 ]

in the shell
db.foo.getIndexKeys()

do you have a script you used to test this?

Comment by Roger Binns [ 19/Feb/10 ]

How? I am going by the docs which say there is no _id index for capped collections. I also don't add any on any other fields.

Comment by Eliot Horowitz (Inactive) [ 18/Feb/10 ]

Can you verify there isn't an index on the capped collection?

Comment by Roger Binns [ 18/Feb/10 ]

I don't know what you mean exactly by "indexed fields". I just tried this using a capped collection which the wiki page goes to great lengths to point out that _id is not indexed in capped collections. The bug was still present.

Comment by Michael Dirolf [ 17/Feb/10 ]

The problem is that setTailable is only implemented for BasicCursor and not for indexed fields. Not sure what the right fix is though.

Comment by auto [ 17/Feb/10 ]

Author:

{'login': 'mdirolf', 'name': 'Mike Dirolf', 'email': 'mike@10gen.com'}

Message: failing test case for tailable cursor w/ _id query SERVER-645
http://github.com/mongodb/mongo/commit/60391897f34659143e20af6113901430d4d422c6

Comment by Michael Dirolf [ 17/Feb/10 ]

test case coming...

Generated at Thu Feb 08 02:54:45 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.