Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-62147

Exhaust query using the OP_QUERY protocol is broken when more than one getMore batch is required

    • Type: Icon: Bug Bug
    • Resolution: Fixed
    • Priority: Icon: Major - P3 Major - P3
    • 5.0.6, 4.2.19, 4.4.13
    • Affects Version/s: 4.2.17, 4.4.10, 5.0.5
    • Component/s: Query Execution
    • Labels:
      None
    • Fully Compatible
    • ALL
    • v5.0, v4.4, v4.2
    • Hide

      Run the following script against a 5.0 server using a 5.0 shell:

      (function() {
      const coll = db.c;
      coll.drop();
      
      const strSize = 16 * 1024;
      const numDocs = 2000;
      
      const str = "A".repeat(strSize);
      
      let bulk = coll.initializeUnorderedBulkOp();
      for (let i = 0; i < numDocs; ++i) {
          bulk.insert({key: str});
      }
      assert.commandWorked(bulk.execute());
      
      print("attempting exhaust query...");
      assert.eq(2000, coll.find().addOption(DBQuery.Option.exhaust).itcount());
      }());
      
      Show
      Run the following script against a 5.0 server using a 5.0 shell: (function() { const coll = db.c; coll.drop(); const strSize = 16 * 1024; const numDocs = 2000; const str = "A" .repeat(strSize); let bulk = coll.initializeUnorderedBulkOp(); for (let i = 0; i < numDocs; ++i) { bulk.insert({key: str}); } assert .commandWorked(bulk.execute()); print( "attempting exhaust query..." ); assert .eq(2000, coll.find().addOption(DBQuery.Option.exhaust).itcount()); }());
    • QE 2021-12-27, QE 2022-01-10, QE 2022-01-24

      Some clients – in particular, the legacy shell, the C driver, and the Python driver – allow the exhaust option to be set on a find operation. This causes the server to write the response to the initial find operation, as well as all subsequent getMore batches, to the socket without waiting for the client to send explicit getMore requests. Exhaust has two implementations in the server, one using the legacy OP_QUERY protocol and another using OP_MSG.

      The server implementation of exhaust cursors using OP_QUERY does not work correctly if the response to the query requires more than two batches. The logic is broken in such a way that after sending the first two batches, the server mistakenly categorizes the cursor as non-exhaust. As a consequence, the server waits for the client to send an explicit getMore request. The client, however, correctly believes that the cursor is an exhaust cursor, and waits for the server to write the next batch to the connection. Both sides are waiting for one another, resulting in a hang. It appears that the server tests did not catch this hang because all of our tests for exhaust queries can fit the response within one or two batches.

      The specific flaw in the code is as follows:

      Note that the OP_QUERY protocol was deprecated in 5.0 and removed entirely from the server in version 5.1. This includes removing support for the OP_QUERY exhaust path. Clients which support exhaust have changed their implementation to use OP_MSG exhaust when communicating with servers >=5.1; this was changed for the legacy mongo shell in SERVER-57462 and changed for the Python driver in PYTHON-1636. For this reason, this bug does not affect server version 5.1. Furthermore, it appears to be a regression introduced in version 4.2, so the only affected versions are 4.2, 4.4, and 5.0.

            Assignee:
            david.storch@mongodb.com David Storch
            Reporter:
            david.storch@mongodb.com David Storch
            Votes:
            0 Vote for this issue
            Watchers:
            10 Start watching this issue

              Created:
              Updated:
              Resolved: