[SERVER-34774] Converting an index to multikey is not visible within the same txn, causing incorrect data to be returned Created: 01/May/18  Updated: 29/Oct/23  Resolved: 16/May/18

Status: Closed
Project: Core Server
Component/s: Index Maintenance
Affects Version/s: 3.7.9
Fix Version/s: 4.0.0-rc0

Type: Bug Priority: Major - P3
Reporter: Robert Guo (Inactive) Assignee: Daniel Gottlieb (Inactive)
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
is related to SERVER-41766 Secondary may encounter prepare confl... Closed
is related to SERVER-33738 Create a runCommand() override method... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Steps To Reproduce:

/*
 * Set up cluster.
 */
var r = new ReplSetTest({nodes:2});
r.startSet();r.initiate();
 
var dbName = 'test';
var collName = 'test';
 
const sessionOptions = {causalConsistency: false};
const session = r.getPrimary().startSession(sessionOptions);
const sessionDb = session.getDatabase(dbName);
const sessionColl = sessionDb.getCollection(collName);
 
 
/*
 * Repro
 */
assert.commandWorked(sessionDb.runCommand({create: collName, writeConcern: {w: "majority"}}));
 
assert.writeOK(sessionColl.insert({a: 2}));
assert.commandWorked(sessionColl.createIndex({a: 1}));
r.awaitLastOpCommitted();
 
session.startTransaction();
assert.writeOK(sessionColl.update({}, {$set: {a: [2]}}));
assert.eq({a: [2]}, sessionColl.findOne({a: 2}, {_id: 0, a: 1}));
// session.commitTransaction();

Sprint: Storage NYC 2018-05-21
Participants:
Linked BF Score: 36

 Description   

The update() call in the repro script is not visible, causing the assert.eq on the next line to fail:

assert: [{ "a" : [ 2 ] }] != [{ "a" : 2 }] are not equal
doassert@src/mongo/shell/assert.js:18:14
assert.eq@src/mongo/shell/assert.js:174:1
@test.js:21:1



 Comments   
Comment by Githook User [ 16/May/18 ]

Author:

{'email': 'daniel.gottlieb@mongodb.com', 'username': 'dgottlieb', 'name': 'Daniel Gottlieb'}

Message: SERVER-34774: Maintain multikey state during multi-statement transaction.
Branch: master
https://github.com/mongodb/mongo/commit/def040cc178a7726559e7796557c98384119b16e

Comment by Robert Guo (Inactive) [ 02/May/18 ]

There's a related symptom for geo indexes that Max pointed out could have the same underlying cause. Here is the repro for future reference.

/*
 * Repro
 */
var lineWithDupes = {
    _id: "line",
    geo: {type: "LineString", coordinates: [[40, 5], [41, 6], [42, 7]]}
};
 
assert.commandWorked(sessionDb.runCommand({create: collName, writeConcern: {w: "majority"}}));
 
sessionColl.ensureIndex({geo: "2dsphere"});
r.awaitLastOpCommitted();
 
session.startTransaction();
assert.writeOK(sessionColl.insert(lineWithDupes));
 
// "number of documents: 1"
print('Number of Documents: ' + sessionColl.find({}).hint({_id:1}).itcount());
var res = sessionColl.find({geo: {$geoIntersects: {$geometry: lineWithDUpes.geo}}}).toArray();
 
// this neq assertion fails
assert.neq(res.length, 20);
 
session.commitTransaction();

Comment by Daniel Gottlieb (Inactive) [ 01/May/18 ]

What is the proposed solution? It seems like modifying the index itself would conflict with concurrent reads outside of the transaction. Should we stash multikey index changes as part of the transaction metadata?

 
I think that's a straightforward, targeted solution. I would say this problem falls under the "versioning the catalog" umbrella. We're effectively making a metadata change as part of a transaction (which for the most part we've simply disallowed for 4.0).

To be clear, we haven't modified the on-disk index until the transaction commits. Outside readers are fine, they cannot view any multikey data.

The problem with prematurely setting the in-memory multikey variable outside of the onCommit is that this prevents a future writer from trying to set the index as multikey, ultimately resulting in a subtle race that can result in data corruption (the multikey write must occur at a time <= the first write that makes an index multikey). It's possible for the future writer to actually get its write into the oplog earlier than the write that contains the multikey change.

Comment by Daniel Gottlieb (Inactive) [ 01/May/18 ]

One clarification, the query is not returning stale data, but rather, when an index is multikey, the covered index optimization cannot be used. A better demonstration of what's happening is to instead perform an update that sets the value to of a to [1,2,3]. That gives us the following output when performing an index scan:

    let cursor = sessionColl.find({}, {_id: 0, a: 1}).sort({a: 1});
    while (cursor.hasNext()) {
        printjson(cursor.next());
    }

Result:

{ "a" : 1 }
{ "a" : 2 }
{ "a" : 3 }

Another representation is to also include the record id's in the result (which I think removes the covered index optimization):

    let cursor = sessionColl.find({}, {_id: 0, a: 1}).sort({a: 1}).showRecordId();
    while (cursor.hasNext()) {
        printjson(cursor.next());
    }

Result:

{ "a" : [ 1, 2, 3 ], "$recordId" : NumberLong(1) }
{ "a" : [ 1, 2, 3 ], "$recordId" : NumberLong(1) }
{ "a" : [ 1, 2, 3 ], "$recordId" : NumberLong(1) }

Comment by Kyle Suarez [ 01/May/18 ]

What is the proposed solution? It seems like modifying the index itself would conflict with concurrent reads outside of the transaction. Should we stash multikey index changes as part of the transaction metadata?

Comment by Eric Milkie [ 01/May/18 ]

The multikey flag is set in a commit handler; this is conflicting with the new 4.0 ability to do a write and then read your write from the same transaction. The query system needs to know when reading from an index whether the keys are possibly multikey or not, in order to correctly return covered data from such an index.

Comment by Eric Milkie [ 01/May/18 ]

Dan and I think we know what's going on here.

Comment by Kyle Suarez [ 01/May/18 ]

Let's see what the Query Team has to say at the Needs Triage meeting. I feel like the Repl Team has a lot on its plate right now; if the Query Team has resources to look into the failure, then let's go for it.

Comment by Daniel Gottlieb (Inactive) [ 01/May/18 ]

milkie and I just took a look at this. When thinking about transactions and what how RTT does catalog writes for multikey, I was expecting a bug, but a different manifestation than what this repro is showing.

I do think this is, in essence, a transactions problem. If replication is better equipped to diagnose that, perhaps your suggestion is better. FWIW, I'm running a modified script to see if my hypothetical bug manifests as expected.

Comment by Kyle Suarez [ 01/May/18 ]

daniel.gottlieb, judah.schvimer: should we triage this as the Repl Team first before handing it off to Query?

Generated at Thu Feb 08 04:37:49 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.