[SERVER-21057] Collection scan during concurrent move-update can return invalid results, trip fatal assertion (mmapv1 only) Created: 21/Oct/15  Updated: 01/Jul/19  Resolved: 23/Oct/15

Status: Closed
Project: Core Server
Component/s: Querying, Storage, Write Ops
Affects Version/s: None
Fix Version/s: 3.2.0-rc1

Type: Bug Priority: Critical - P2
Reporter: J Rassi Assignee: Geert Bosch
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to SERVER-21037 Initial sync can miss documents if co... Closed
related to SERVER-42022 Attempt to remove initial sync missin... Closed
related to SERVER-21058 need fail point to stress yielding be... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Sprint: QuInt B (11/02/15)
Participants:

 Description   

If a move-update results in an error and is rolled back (for example, if a duplicate key exception is encountered), any open cursors that were pointing to the affected document could be advanced during invalidation to a record that is no longer valid. The query operations that own these cursors can subsequently return invalid results, or trip a fatal assertion in the storage layer.

This is a regression introduced in the 3.0.x series (reproduced with 3.0.7 and master) that affects mmapv1 deployments only. See discussion in related ticket SERVER-21037 for an explanation of why 2.6.x and earlier is unaffected.

Reproduce with the following script:

assert.commandWorked(db.adminCommand({setParameter: 1, internalQueryExecYieldIterations: 3}));
assert.commandWorked(db.dropDatabase());
assert.commandWorked(db.foo.ensureIndex({a: 1}, {unique: true}));
assert.writeOK(db.foo.insert({_id: 0, a: 0}));
assert.writeOK(db.foo.insert({_id: 1, a: 1}));
assert.writeOK(db.foo.insert({_id: 2, a: 2}));
assert.writeOK(db.foo.insert({_id: 3, a: 3, x: new Array(1024).join("x")}));
assert.writeOK(db.foo.insert({_id: 4, a: 4, x: new Array(1024).join("x")}));
assert.writeOK(db.foo.remove({_id: 3}));
assert.writeOK(db.foo.remove({_id: 4}));
startParallelShell(
    'while (true) { \
        for (var i=0; i<3; i++) { \
            db.foo.update({_id: i}, {$set: {x: new Array(1024).join("x"), a: (i + 1) % 3}}); \
        } \
        sleep(1000); \
    }');
db.foo.find().itcount();

When run against master (07168e08) with the below patch applied, the server trips fatal assertion 17441 on the last line of the script. When run against v3.0 with the below patch applied, the server returns the error "BSONObj size: -286331154 (0xEEEEEEEE) is invalid. Size must be between 0 and 16793600(16MB) First element: _id: 4.0" to the user in the last line of the script.

Patch to greatly increase reproducibility:

diff --git a/src/mongo/db/query/query_yield.cpp b/src/mongo/db/query/query_yield.cpp
index 4e0d463..7edde6e 100644
--- a/src/mongo/db/query/query_yield.cpp
+++ b/src/mongo/db/query/query_yield.cpp
@@ -62,6 +62,10 @@ void QueryYield::yieldAllLocks(OperationContext* txn, RecordFetcher* fetcher) {
     // locks). If we are yielding, we are at a safe place to do so.
     txn->recoveryUnit()->abandonSnapshot();
 
+    if (txn->getNS() == "test.foo") {
+        sleepmillis(2000);
+    }
+
     // Track the number of yields in CurOp.
     CurOp::get(txn)->yielded();



 Comments   
Comment by Githook User [ 23/Oct/15 ]

Author:

{u'username': u'GeertBosch', u'name': u'Geert Bosch', u'email': u'geert@mongodb.com'}

Message: SERVER-21057: Undo MMAPv1 invalidations on rollback
Branch: master
https://github.com/mongodb/mongo/commit/8f062d2799eb310bb062675bbcd8e82da1b691a4

Generated at Thu Feb 08 03:56:09 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.