[CXX-1100] Having more than one iterator per cursor violates mongoc lifecycle constraints Created: 24/Oct/16  Updated: 13/Dec/16  Resolved: 15/Nov/16

Status: Closed
Project: C++ Driver
Component/s: Implementation
Affects Version/s: None
Fix Version/s: 3.1.0

Type: Bug Priority: Major - P3
Reporter: David Golden Assignee: David Golden
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
is related to CXX-1001 mongocxx::cursor.begin() increments t... Closed
is related to CDRIVER-1886 Expose mongoc_cursor_t 'sent' field Closed
Backwards Compatibility: Major Change

 Description   

Given a single cursor, iterator construction and iteration result in the construction of bsoncxx::document::view objects wrapping the bson_t pointer returned by a call to mongoc_cursor_next. However, the validity of this bson_t pointer is only until the next call to mongoc_cursor_next (see docs).

Consider this example:

auto cursor = db["fo"].find({});
auto iter1 = cursor.begin();  // Calls mongoc_cursor_next().
auto iter2 = cursor.begin();  // Calls mongoc_cursor_next().
 
// At this point, iter1._doc violates the lifecycle constraints.
 
iter1++;
 
// At this point, iter2._doc violates the lifecycle constraints.

In short, only the most recently iterated iterator ever has a valid bsoncxx::document::view.



 Comments   
Comment by Githook User [ 29/Nov/16 ]

Author:

{u'username': u'xdg', u'name': u'David Golden', u'email': u'xdg@xdg.me'}

Message: CXX-1001 CXX-1100 Fix mongocxx cursor iteration

CXX-1001 reported the repeated calls to mongocxx::cursor::begin() would
increment the cursor and lose documents. This is fixed by only
incrementing the cursor during construction one time (required to send the
query/command to the server).

CXX-1100 reported that incrementing an iterator would leave all other
iterators on that cursor with dangling pointers. While this is
technically legal, if undefined behavior, for input iterators, the team
decided to have all iterators from the same cursor move in lockstep;
this is also legal and prevents undefined behaviors, which is more
user-friendly. This is achieved by moving the "current document" view
into the private cursor implementation rather than keeping copies in
individual iterators.

The changes above required some additional coordination to track cursors
sent and dead/finished. Strangely, we couldn't depend on libmongoc
functions for this due to what appear to be some inconsistencies in
libmongoc's handling of "traditional" cursors and "command cursors".
Branch: master
https://github.com/mongodb/mongo-cxx-driver/commit/e2c086fe158c9a21f1d168ec13800b1a04e2211f

Comment by Githook User [ 29/Nov/16 ]

Author:

{u'username': u'xdg', u'name': u'David Golden', u'email': u'xdg@xdg.me'}

Message: CXX-1001 CXX-1100 Fix mongocxx cursor iteration

CXX-1001 reported the repeated calls to mongocxx::cursor::begin() would
increment the cursor and lose documents. This is fixed by only
incrementing the cursor during construction one time (required to send the
query/command to the server).

CXX-1100 reported that incrementing an iterator would leave all other
iterators on that cursor with dangling pointers. While this is
technically legal, if undefined behavior, for input iterators, the team
decided to have all iterators from the same cursor move in lockstep;
this is also legal and prevents undefined behaviors, which is more
user-friendly. This is achieved by moving the "current document" view
into the private cursor implementation rather than keeping copies in
individual iterators.

The changes above required some additional coordination to track cursors
sent and dead/finished. Strangely, we couldn't depend on libmongoc
functions for this due to what appear to be some inconsistencies in
libmongoc's handling of "traditional" cursors and "command cursors".
Branch: master
https://github.com/mongodb/mongo-cxx-driver/commit/e2c086fe158c9a21f1d168ec13800b1a04e2211f

Comment by Githook User [ 15/Nov/16 ]

Author:

{u'username': u'xdg', u'name': u'David Golden', u'email': u'xdg@xdg.me'}

Message: CXX-1001 CXX-1100 Fix mongocxx cursor iteration

CXX-1001 reported the repeated calls to mongocxx::cursor::begin() would
increment the cursor and lose documents. This is fixed by only
incrementing the cursor during construction one time (required to send the
query/command to the server).

CXX-1100 reported that incrementing an iterator would leave all other
iterators on that cursor with dangling pointers. While this is
technically legal, if undefined behavior, for input iterators, the team
decided to have all iterators from the same cursor move in lockstep;
this is also legal and prevents undefined behaviors, which is more
user-friendly. This is achieved by moving the "current document" view
into the private cursor implementation rather than keeping copies in
individual iterators.

The changes above required some additional coordination to track cursors
sent and dead/finished. Strangely, we couldn't depend on libmongoc
functions for this due to what appear to be some inconsistencies in
libmongoc's handling of "traditional" cursors and "command cursors".
Branch: 3.1-dev
https://github.com/mongodb/mongo-cxx-driver/commit/e2c086fe158c9a21f1d168ec13800b1a04e2211f

Comment by Githook User [ 15/Nov/16 ]

Author:

{u'username': u'xdg', u'name': u'David Golden', u'email': u'xdg@xdg.me'}

Message: CXX-1001 CXX-1100 Fix mongocxx cursor iteration

CXX-1001 reported the repeated calls to mongocxx::cursor::begin() would
increment the cursor and lose documents. This is fixed by only
incrementing the cursor during construction one time (required to send the
query/command to the server).

CXX-1100 reported that incrementing an iterator would leave all other
iterators on that cursor with dangling pointers. While this is
technically legal, if undefined behavior, for input iterators, the team
decided to have all iterators from the same cursor move in lockstep;
this is also legal and prevents undefined behaviors, which is more
user-friendly. This is achieved by moving the "current document" view
into the private cursor implementation rather than keeping copies in
individual iterators.

The changes above required some additional coordination to track cursors
sent and dead/finished. Strangely, we couldn't depend on libmongoc
functions for this due to what appear to be some inconsistencies in
libmongoc's handling of "traditional" cursors and "command cursors".
Branch: 3.1-dev
https://github.com/mongodb/mongo-cxx-driver/commit/e2c086fe158c9a21f1d168ec13800b1a04e2211f

Comment by David Golden [ 09/Nov/16 ]

https://mongodbcr.appspot.com/107340001/

Comment by J Rassi [ 27/Oct/16 ]

Oh, yes, that's true! I like that. If we do end up removing begin()/end() and go with recommending that route, we should also provide some supporting non-lambda example code, for users who aren't comfortable with using lambdas.

Comment by David Golden [ 27/Oct/16 ]

Another option would be letting people use std::for_each and a lambda.

Comment by J Rassi [ 27/Oct/16 ]

Here's a proposal for syntactic sugar we could provide to possibly replace range-for loops for cursor iteration:

cursor::iterator it(some_cursor);
while (auto doc = our_sugar_function(&it)) {
    // "doc" is of type stdx::optional<document::view>.
}

Comment by David Golden [ 26/Oct/16 ]

I chatted with Dana yesterday about these issues and his thoughts were more or less as follows:

  • begin()/end() is an expectation users will have; they'll want it for STL algorithms/templates.
  • begin() shouldn't advance the cursor; it made sense to him to have begin() get the first document only if the query wasn't sent and otherwise just return the current document.
  • He liked the idea of moving _doc into the cursor object; since it's a view on the buffer of the unique implementation, keeping it with the implementation avoids lifetime issues.
  • He didn't like the idea of throwing exceptions at runtime (e.g. calling begin() twice).
  • He didn't think we should have begin() run the query again
  • He was open to the idea of a "clone" method on the cursor for users who really needed to iterate from the start again.

In response to other comments, my own thoughts:

  • We agreed in CXX-1001 to keep begin()/end(); I'd like to stick to that rather than repeat the arguments a second time because I don't think we'll arrive at a different conclusion.
  • Looking at InputIterator, nothing forbids us from having all iterators stay "in sync" on the current document in the implementation.

Therefore, I think building on idea #1 is simple to implement and explain and addresses all the needs we have:

  • Move _doc to mongocxx::cursor
  • mongocxx::cursor::iterator construction from mongocxx::cursor no longer advances the libmongoc cursor; it merely sets its _cursor pointer
  • begin() advances the cursor only if the query has not yet executed; this requires adding a started_ boolean or equivalent to mongocxx::cursor as there appears to be no way to get this from libmongoc's cursor API
  • dereferencing any iterator always returns the _doc via the iterators cursor pointer
  • incrementing any iterator always increments the libmongoc cursor and updates _doc in the cursor object
Comment by J Rassi [ 25/Oct/16 ]

Depending on the choice, we should probably also consider disallowing copy construction of cursor iterators.

I don't think we should. In order for cursor::iterator to satisfy InputIterator, it must satisfy Iterator. For a type to satisfy Iterator, it must be copy-constructible and copy-assignable (see http://en.cppreference.com/w/cpp/concept/Iterator).

Comment by J Rassi [ 25/Oct/16 ]

My vote would be for the following (somewhat equivalent to #3 above):
1) For the next 3.0.x release, make cursor::begin() throw an exception if it's called twice, or at least document it as invalid to call twice.
2) For either 3.1 or 4.0, remove cursor::begin() and cursor::end() entirely. In my opinion, they make it too easy for novice users to shoot themselves in the foot.

I don't think of the above as limiting at all. cursor::iterator is an InputIterator, and you can pass it to any algorithm that takes an InputIterator. I think that trying to fake ForwardIterator functionality for cursor::iterator is a bad idea.

To assist users who are less familiar with the details of the InputIterator concept, we could also put a note in the class comment for cursor::iterator which reminds users that incrementing the iterator invalidates all copies of that iterator.

Comment by David Golden [ 24/Oct/16 ]

Brainstorming possible solutions:

  1. Keep _doc in the cursor object and have iterators access it via the cursor object. This means all iterators are just API aliases of the cursor object. iter1 and iter2 always refer to the same document, regardless of which one was incremented last.
  2. Put a stdx::optional<cursor::iterator> in the cursor object. Construct it once and return references to it for begin(). Like #1, this means all iterators are just aliases of the "singleton" iterator for the cursor.
  3. Make getting a second iterator from a cursor fatal (this is probably quite limiting).
  4. Have begin() clone the underlying mongoc_cursor_t and execute a separate query. This makes begin() idempotent, but repeated use expensive.
  5. Copy documents into iterators so their lifecycles are independent. (Safe but slow.)

Depending on the choice, we should probably also consider disallowing copy construction of cursor iterators.

Generated at Wed Feb 07 22:01:24 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.