[CDRIVER-2736] Mongoc cursor error when running find_with_opts against a sharded cluster Created: 11/Jul/18  Updated: 28/Oct/23  Resolved: 17/Jul/18

Status: Closed
Project: C Driver
Component/s: None
Affects Version/s: 1.5.0
Fix Version/s: 1.12.0

Type: Bug Priority: Critical - P2
Reporter: Spencer Mckenney Assignee: A. Jesse Jiryu Davis
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File example-client.c    
Issue Links:
Related
is related to PHPC-1251 Upgrade libmongoc to 1.12.0 Closed
is related to SERVER-21086 mongos does not kill cursor with getM... Closed

 Description   

The cursor implementation in the mongo-c-driver makes this assumption:

If the server returns a non-zero cursor id, then the cursor isn't finished because there are more documents to iterate through.

Right now, sharded clusters return a non-zero cursor id even when the document limit has been reached. So, our cursor throws an error when it gets conflicting information: The user-defined limit has been reached but the cursor id isn't zero.

Here is the command that fails (all the way back to mongo-c-driver version 1.5):

mongoc_collection_find_with_opts (
      collection,
      bson_new(), /* filter */
      BCON_NEW("limit", BCON_INT64(2), "batchSize", BCON_INT64(2)), /* other options */
      NULL);

Here is the assertion that fails when the command is ran:

      int64_t remaining = limit - cursor->count;
      BSON_ASSERT (remaining > 0);

I attached an example-client.c file that can be ran to reproduce the bug.



 Comments   
Comment by Githook User [ 17/Jul/18 ]

Author:

{'email': 'jesse@mongodb.com', 'name': 'A. Jesse Jiryu Davis', 'username': 'ajdavis'}

Message: CDRIVER-2736 fix crash in query with limit
Branch: master
https://github.com/mongodb/mongo-c-driver/commit/69fd046f4adc804cee19d88407c5384504978dc9

Comment by Githook User [ 17/Jul/18 ]

Author:

{'email': 'jesse@mongodb.com', 'name': 'A. Jesse Jiryu Davis', 'username': 'ajdavis'}

Message: CDRIVER-2736 fix crash in query with limit
Branch: master
https://github.com/mongodb/mongo-c-driver/commit/b7c893556c5ab2cddfe249678edbf84446b9e9e1

Comment by Derick Rethans [ 16/Jul/18 ]

I just like it in a soon-release. The reason why I suggested 1.11.x, is that for the PHP driver, many times, going from libmongoc 1.x to 1.x+1 involved API changes that required extra work on our side. We prefer only to have to make API changes going from 1.x to 1.x+1 due to making it lower-risk. Let's chat about the policy topic in tomorrow's meeting.

Comment by A. Jesse Jiryu Davis [ 16/Jul/18 ]

Does it actually matter whether the release in which this is fixed is named 1.11.x? Or is the important thing that the bugfix is released soon? We can put this bugfix in 1.12.0 and release 1.12.0 right away. Is that acceptable?

(My policy is to fix old bugs in minor releases, and new bugs in patch releases. So, 1.11.x only fixes bugs introduced in 1.11.0 or a previous 1.11.x patch release. A bug we've had since 1.5.0 can't be fixed in 1.11.1. The purpose of this policy is to guarantee that each patch release is lower-risk than each minor release, by only changing code to fix bugs since the last minor release.)

Comment by Derick Rethans [ 16/Jul/18 ]

FWIW, I also just ran into this running the Crud Spec Functional PHP Library tests against a sharded cluster now we have this set-up with MO (for Travis). Our tests abort with:

/home/derick/dev/php/derickr-mongo-php-driver/src/libmongoc/src/libmongoc/src/mongoc/mongoc-cursor.c:140 _mongoc_n_return(): precondition failed: remaining > 0
Aborted

I would like to argue for this being included in a 1.11.x release.

Backtrace:

#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
        set = {__val = {0, 0, 0, 0, 0, 0, 1215475744768, 93825010060904, 48, 140737488325552, 140737488325296, 140737488325568, 140737488325312, 140737250413787, 
            140737488325359, 140737488325372}}
        pid = <optimized out>
        tid = <optimized out>
        ret = <optimized out>
#1  0x00007ffff37cb2f1 in __GI_abort () at abort.c:79
        save_stage = 1
        act = {__sigaction_handler = {sa_handler = 0x7ffff1dd3f30, sa_sigaction = 0x7ffff1dd3f30}, sa_mask = {__val = {140, 140737251198384, 140737251196704, 140737251198092, 
              93825017776464, 93825017776496, 17179869243, 21474836484, 15, 0, 23, 140737251198092, 93825017776464, 140737488325680, 140737250657645, 140737488325680}}, 
          sa_flags = -237698042, sa_restorer = 0x0}
        sigs = {__val = {32, 0 <repeats 15 times>}}
#2  0x00007ffff1d4cdc1 in _mongoc_n_return (cursor=0x555556db0950) at /home/derick/dev/php/derickr-mongo-php-driver/src/libmongoc/src/libmongoc/src/mongoc/mongoc-cursor.c:140
        remaining = 0
        limit = 4
        batch_size = 2
        n_return = 2
        __func__ = "_mongoc_n_return"
#3  0x00007ffff1d5100d in _mongoc_cursor_prepare_getmore_command (cursor=0x555556db0950, command=0x7fffffff8cd0)
    at /home/derick/dev/php/derickr-mongo-php-driver/src/libmongoc/src/libmongoc/src/mongoc/mongoc-cursor.c:1624
        collection = 0x555556db0a20 "CrudSpecFunctionalTest.eaddaabb"
        collection_len = 31
        batch_size = 2
        await_data = false
        max_await_time_ms = 1457198747
        __func__ = "_mongoc_cursor_prepare_getmore_command"
#4  0x00007ffff1d51d89 in _get_next_batch (cursor=0x555556db0950)
    at /home/derick/dev/php/derickr-mongo-php-driver/src/libmongoc/src/libmongoc/src/mongoc/mongoc-cursor-find-cmd.c:65
        data = 0x555556da7500
        getmore_cmd = {flags = 3, len = 70, 
          padding = "F\000\000\000\022getMore\000Ly8\222M\223\270\061\002collection\000 \000\000\000CrudSpecFunctionalTest.eaddaabb\000\000\000\000\240\374\332V", '\000' <repeats 12 times>, "`\215\377\377\377\177\000\000\375\034\325\361\377\177\000\000P\212\243\346\377\177\000\000P\t\333VUU\000"}
#5  0x00007ffff1d4f9bc in _call_transition (cursor=0x555556db0950) at /home/derick/dev/php/derickr-mongo-php-driver/src/libmongoc/src/libmongoc/src/mongoc/mongoc-cursor.c:1124
        state = END_OF_BATCH
        fn = 0x7ffff1d51d32 <_get_next_batch>
#6  0x00007ffff1d4fcdb in mongoc_cursor_next (cursor=0x555556db0950, bson=0x7fffffff8de8)
    at /home/derick/dev/php/derickr-mongo-php-driver/src/libmongoc/src/libmongoc/src/mongoc/mongoc-cursor.c:1192
        ret = false
        attempted_refresh = true
        __func__ = "mongoc_cursor_next"
#7  0x00007ffff1db2f2e in php_phongo_cursor_iterator_move_forward (iter=0x7fffe682be00) at /home/derick/dev/php/derickr-mongo-php-driver/src/MongoDB/Cursor.c:125
        cursor_it = 0x7fffe682be00
        cursor = 0x7fffe68e5380
        doc = 0x0

Comment by A. Jesse Jiryu Davis [ 12/Jul/18 ]

Let's not assert on the relationship b/w limit, batchSize, and cursor id. (Don't crash no matter the server does.) Let's keep sending getMore until cursor id is 0, even if limit is reached.

Comment by Kevin Albertson [ 11/Jul/18 ]

This showed up when implementing the CRUD spec tests in CDRIVER-2530, specifically this test. It's surprising that this has been around since 1.5 (with the server behavior reported in 3.2) and hasn't been caught before. In the C++ driver, collection::find uses mongoc_collection_find_with_opts, and we do run the CRUD spec tests. But not against a sharded cluster. I've confirmed that running test_crud_specs in the C++ driver with a sharded cluster fails with the same issue:

/Users/kevinalbertson/code/mongo-cxx-driver/cmake-build-debug/src/mongocxx/test/test_crud_specs
 
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/Users/kevinalbertson/code/mongo-c-driver/src/libmongoc/src/mongoc/mongoc-cursor.c:140 _mongoc_n_return(): precondition failed: remaining > 0
test_crud_specs is a Catch v2.2.1 host application.
Run with -? for options
 
-------------------------------------------------------------------------------
CRUD spec automated tests
-------------------------------------------------------------------------------
/Users/kevinalbertson/code/mongo-cxx-driver/src/mongocxx/test/spec/crud.cpp:890
...............................................................................
 
/Users/kevinalbertson/code/mongo-cxx-driver/src/mongocxx/test/spec/crud.cpp:890: FAILED:
  {Unknown expression after the reported line}
due to a fatal error condition:
  Test path: /Users/kevinalbertson/code/mongo-cxx-driver/data/crud/./read/find.
  json
  Test description: Find with limit, sort, and batchsize
  SIGABRT - Abort (abnormal termination) signal
 
===============================================================================
test cases:  1 |  0 passed | 1 failed
assertions: 26 | 25 passed | 1 failed
 
 
Process finished with exit code 6

Generated at Wed Feb 07 21:16:10 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.