Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-14766

Indexed queries should not miss documents where neither the queried nor indexed fields change during the life of the query.

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major - P3
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: Backlog
    • Component/s: Querying
    • Labels:
    • Operating System:
      ALL
    • Steps To Reproduce:
      Hide

      Example/Repro

      function repro() {
       
          var k2="";
          for(i=0;i<2048;i++){
              k2 = k2+"_"
          }
       
          db.c.drop()
          // insert a "large" document
          db.c.insert({_id:-1, letter:'a',pad:k2})
          // insert 5 records
          for (var i=0; i<5; i++)
              db.c.insert({_id:i, letter:'a'})
          db.c.ensureIndex({letter:1})
       
          // remove the large document to free up space at the begining
          db.c.remove({_id:-1})
       
          // start query, fetch first batch of 2
          cursor = db.c.find().sort({letter:1}).batchSize(2)
          print('got', cursor.next()._id)
       
          // server cursor is now pointing to {_id:2} waiting for our getmore
          // Increase the size of {_id:3} so it moves back (to where {_id:-1} was)
          // any document with _id >2 will work in this repro
          db.c.update({_id:3}, {$set: {pad:k2}})
       
          // use our cursor to get the rest; note that {_id:3} is omitted
          while (cursor.hasNext())
              print('got', cursor.next()._id)
      }
      

      Show
      Example/Repro function repro() {   var k2=""; for(i=0;i<2048;i++){ k2 = k2+"_" }   db.c.drop() // insert a "large" document db.c.insert({_id:-1, letter:'a',pad:k2}) // insert 5 records for (var i=0; i<5; i++) db.c.insert({_id:i, letter:'a'}) db.c.ensureIndex({letter:1})   // remove the large document to free up space at the begining db.c.remove({_id:-1})   // start query, fetch first batch of 2 cursor = db.c.find().sort({letter:1}).batchSize(2) print('got', cursor.next()._id)   // server cursor is now pointing to {_id:2} waiting for our getmore // Increase the size of {_id:3} so it moves back (to where {_id:-1} was) // any document with _id >2 will work in this repro db.c.update({_id:3}, {$set: {pad:k2}})   // use our cursor to get the rest; note that {_id:3} is omitted while (cursor.hasNext()) print('got', cursor.next()._id) }

      Description

      Description

      This behavior is only observable in MMAPV1 storage engine

      Desired Behavior

      If an indexed query runs while documents are updated, which moves them, it is possible for those documents to be missing from the results when using MMAPV1 storage engine. We would like this behavior to change so that all matching documents which exist through the lifetime of the query are returned, even if they are updated. In particular we only expect this behavior when those updated documents have values updates in the query which aren't changed, so the query matches the document in all updated states.

      Example

      See code below

      • Add a large document, followed by 5 small documents all with {letter: "a"}
      • Add an index on {letter: 1}
      • Remove the large document
      • Start a query, batch size 2, using index
      • Update 3rd document to cause it to move to empty space left by large, removed document.

      Technical Details

      When a query walks an index in MMAPV1 it is possible for documents to move behind the current position as document location is stored in the index in MMAPV1 as the cursor moves forward resulting in documents being "missed".

      This behavior cannot be reproduced in WiredTiger, inMemory or encrypted storage engines.

        Issue Links

          Activity

          Hide
          danx0r dan miller added a comment - - edited

          At a minimum the docs should be more clear about stale cursors - that documents may be skipped as well as duped. There is a cryptic reference at https://docs.mongodb.com/manual/reference/method/cursor.snapshot/#cursor.snapshot "The snapshot() does not guarantee isolation from insertion or deletions." but does that cut it? Can't a document be moved to behind the cursor due to minor updates, as this ticket suggests?

          Without proper documentation of expected behavior, we will keep getting threads like this: https://news.ycombinator.com/item?id=11857674

          Show
          danx0r dan miller added a comment - - edited At a minimum the docs should be more clear about stale cursors - that documents may be skipped as well as duped. There is a cryptic reference at https://docs.mongodb.com/manual/reference/method/cursor.snapshot/#cursor.snapshot "The snapshot() does not guarantee isolation from insertion or deletions." but does that cut it? Can't a document be moved to behind the cursor due to minor updates, as this ticket suggests? Without proper documentation of expected behavior, we will keep getting threads like this: https://news.ycombinator.com/item?id=11857674
          Hide
          jblackburn James Blackburn added a comment -

          Does this affect WiredTiger too?

          Show
          jblackburn James Blackburn added a comment - Does this affect WiredTiger too?
          Hide
          pasette Dan Pasette added a comment -

          James Blackburn, The script in the description does not impact WiredTiger. This is because WiredTIger is not impacted by document moves, caused by changing the size of a document. However, if an indexed field is updated during the course of the operation, it indeed can change the results of the query or update.

          In fact, the original "repro" in this ticket did indeed remove the letter field which the query depends on. I updated the repro in the description to use $set rather than a save:

          db.c.update({_id:3}, {$set: {pad:k2}})
          

          Show
          pasette Dan Pasette added a comment - James Blackburn , The script in the description does not impact WiredTiger. This is because WiredTIger is not impacted by document moves, caused by changing the size of a document. However, if an indexed field is updated during the course of the operation, it indeed can change the results of the query or update. In fact, the original "repro" in this ticket did indeed remove the letter field which the query depends on. I updated the repro in the description to use $set rather than a save: db.c.update({_id:3}, {$set: {pad:k2}})
          Hide
          jblackburn James Blackburn added a comment -

          Great, thanks Dan!

          Show
          jblackburn James Blackburn added a comment - Great, thanks Dan!
          Hide
          asya Asya Kamsky added a comment -

          James Blackburn I updated the description of the ticket to try to make it more clear when this scenario can happen.

          Show
          asya Asya Kamsky added a comment - James Blackburn I updated the description of the ticket to try to make it more clear when this scenario can happen.

            People

            • Votes:
              7 Vote for this issue
              Watchers:
              30 Start watching this issue

              Dates

              • Created:
                Updated: