Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Done
Priority: Critical - P2
Fix Version/s: 3.2.0-rc1
Affects Version/s: None
Component/s: Querying, Storage, Write Ops
Labels:
None

Backwards Compatibility:
Fully Compatible
Operating System:
ALL
Sprint:
QuInt B (11/02/15)
Confidence Status:
None
Work Order:
0
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

If a move-update results in an error and is rolled back (for example, if a duplicate key exception is encountered), any open cursors that were pointing to the affected document could be advanced during invalidation to a record that is no longer valid. The query operations that own these cursors can subsequently return invalid results, or trip a fatal assertion in the storage layer.

This is a regression introduced in the 3.0.x series (reproduced with 3.0.7 and master) that affects mmapv1 deployments only. See discussion in related ticket ~~SERVER-21037~~ for an explanation of why 2.6.x and earlier is unaffected.

Reproduce with the following script:

assert.commandWorked(db.adminCommand({setParameter: 1, internalQueryExecYieldIterations: 3}));
assert.commandWorked(db.dropDatabase());
assert.commandWorked(db.foo.ensureIndex({a: 1}, {unique: true}));
assert.writeOK(db.foo.insert({_id: 0, a: 0}));
assert.writeOK(db.foo.insert({_id: 1, a: 1}));
assert.writeOK(db.foo.insert({_id: 2, a: 2}));
assert.writeOK(db.foo.insert({_id: 3, a: 3, x: new Array(1024).join("x")}));
assert.writeOK(db.foo.insert({_id: 4, a: 4, x: new Array(1024).join("x")}));
assert.writeOK(db.foo.remove({_id: 3}));
assert.writeOK(db.foo.remove({_id: 4}));
startParallelShell(
    'while (true) { \
        for (var i=0; i<3; i++) { \
            db.foo.update({_id: i}, {$set: {x: new Array(1024).join("x"), a: (i + 1) % 3}}); \
        } \
        sleep(1000); \
    }');
db.foo.find().itcount();

When run against master (07168e08) with the below patch applied, the server trips fatal assertion 17441 on the last line of the script. When run against v3.0 with the below patch applied, the server returns the error "BSONObj size: -286331154 (0xEEEEEEEE) is invalid. Size must be between 0 and 16793600(16MB) First element: _id: 4.0" to the user in the last line of the script.

Patch to greatly increase reproducibility:

diff --git a/src/mongo/db/query/query_yield.cpp b/src/mongo/db/query/query_yield.cpp
index 4e0d463..7edde6e 100644
--- a/src/mongo/db/query/query_yield.cpp
+++ b/src/mongo/db/query/query_yield.cpp
@@ -62,6 +62,10 @@ void QueryYield::yieldAllLocks(OperationContext* txn, RecordFetcher* fetcher) {
     // locks). If we are yielding, we are at a safe place to do so.
     txn->recoveryUnit()->abandonSnapshot();

+    if (txn->getNS() == "test.foo") {
+        sleepmillis(2000);
+    }
+
     // Track the number of yields in CurOp.
     CurOp::get(txn)->yielded();

related to

SERVER-21037 Initial sync can miss documents if concurrent update results in error (mmapv1 only)

Closed

SERVER-42022 Attempt to remove initial sync missing document fetching

Closed

SERVER-21058 need fail point to stress yielding behavior

Closed

Assignee:: Geert Bosch
Reporter:: J Rassi (Inactive)
Participants:: Geert Bosch, Githook User, J Rassi
Votes:: 0 Vote for this issue
Watchers:: 6 Start watching this issue

Created:: Oct 21 2015 08:04:01 PM UTC
Updated:: Jul 01 2019 01:41:53 PM UTC
Resolved:: Oct 23 2015 10:26:32 PM UTC

Details

Description

Attachments

Issue Links

Forms

Activity

People

Dates