[SERVER-38860] Positional array update behavior of applyOps on invalid field varies on different versions Created: 05/Jan/19  Updated: 27/Oct/23  Resolved: 18/Mar/19

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Major - P3
Reporter: Siyuan Zhou Assignee: Siyuan Zhou
Resolution: Works as Designed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
is duplicated by SERVER-38747 Continually changing behavior of appl... Closed
Related
related to SERVER-38747 Continually changing behavior of appl... Closed
related to SERVER-43043 Add idempotency test for two fields u... Closed
Sprint: Repl 2019-03-25
Participants:

 Description   

Positional array update of applyOps on an invalid field isn't consistent from 3.2 to 4.0.

Here are the test and results from shane.harvey and john.morales.

$ cat applyOps.js
// Run applyOps as mongomirror would.
// Use like: applyOps([{op: "i", ns:"test.test", o:{_id:1}}, {op: "i", ns:"test.test", o:{_id:2}}])
function applyOps(ops, extra) {
    // Force non-atomic applyOps mode for this batch
    ops.push({op: "c", ns: "admin.$cmd", o: {applyOps: [{op: "n", ns: "", o: {"msg": "noop"}}]}});
    var command = {
        applyOps: ops,
        writeConcern: {w: "majority"}
    };
    var res = db.adminCommand(command);
    // Avoid noise in output.
    delete res["operationTime"];
    delete res["$clusterTime"];
    return res
}
 
db.test.drop()
db.test.insert({_id:1, a: null});
 
// Test behavior in HELP-8472.
print('doc before: ' + tojson(db.test.findOne({_id: 1})));
var res = applyOps([{ns:"test.test", op:"u", o2:{_id:1}, o:{$set:{'a.0':2}}}]);
print('applyOps result: ' + tojson(res))
 
print('doc after : ' + tojson(db.test.findOne({_id: 1})));

The behavior is different across releases.

$ On 3.2 the server reports success and applies the update:
mongo applyOps.js
MongoDB shell version v4.0.1
connecting to: mongodb://127.0.0.1:27017
MongoDB server version: 3.2.18
WARNING: shell and server versions do not match
doc before: { "_id" : 1, "a" : null }
applyOps result: { "applied" : 2, "results" : [ true, true ], "ok" : 1 }
doc after : { "_id" : 1, "a" : { "0" : 2 } }
 
On 3.4 the server reports an error and does not apply the update:
$ mongo applyOps.js
MongoDB shell version v4.0.1
connecting to: mongodb://127.0.0.1:27017
MongoDB server version: 3.4.18
WARNING: shell and server versions do not match
doc before: { "_id" : 1, "a" : null }
applyOps result: {
    "applied" : 1,
    "code" : 16837,
    "codeName" : "Location16837",
    "errmsg" : "cannot use the part (a of a.0) to traverse the element ({a: null})",
    "results" : [
        false
    ],
    "ok" : 0
}
doc after : { "_id" : 1, "a" : null }
 
On 3.6, the server reverts back to the 3.2 behavior, reports success and applies the update:
$ mongo applyOps.js
MongoDB shell version v4.0.1
connecting to: mongodb://127.0.0.1:27017
MongoDB server version: 3.6.9
WARNING: shell and server versions do not match
doc before: { "_id" : 1, "a" : null }
applyOps result: { "applied" : 2, "results" : [ true, true ], "ok" : 1 }
doc after : { "_id" : 1, "a" : { "0" : 2 } }
 
Finally on 4.0, the server reports success and *does not apply the update*:
$ mongo applyOps.js
MongoDB shell version v4.0.1
connecting to: mongodb://127.0.0.1:27017
MongoDB server version: 4.0.4
doc before: { "_id" : 1, "a" : null }
applyOps result: { "applied" : 2, "results" : [ true, true ], "ok" : 1 }
doc after : { "_id" : 1, "a" : null }

Because of the ambiguity, 0 is interpreted as a field name.

However, there is no idempotency concern because if an error is returned, initial sync will refetch the document as if the document is missing; if the update succeeds, other fields are still updated correctly. Whatever intermediate state the field has, it will be cleaned up eventually since a later version doesn't have it. For example, if the field is an array eventually, there must be a $set to set the whole field to a valid array after the problematic update oplog entry.

$ mongo --port 27018 applyOps.js 
MongoDB shell version v4.0.3
connecting to: mongodb://127.0.0.1:27018/
Implicit session: session { "id" : UUID("1f2b00ba-092d-4a1e-a633-8da75b61e436") }
MongoDB server version: 3.6.6
WARNING: shell and server versions do not match
doc before: { "_id" : 1, "a" : null }
applyOps result: { "applied" : 2, "results" : [ true, true ], "ok" : 1 }
doc after : { "_id" : 1, "a" : { "5" : 2 }, "b" : "b" }

The bottom lines for the correctness of idempotency are:

  • Failed update reverts all changes on the document.
  • Successful update does partial update on other fields even if some fields are not expected.
  • Any structural and data type change of a field starts with an explicit $set, except setting to Object. For example, setting a non-existent field "a" to an array ["hello"] cannot be { $set: { "a.0": "hello" } }.


 Comments   
Comment by Siyuan Zhou [ 18/Mar/19 ]

Given that there's no idempotency issue on replication side, I'm closing this as Work As Designed.

Comment by David Storch [ 13/Mar/19 ]

siyuan.zhou tess.avitabile, I confirmed that in master the applyOps succeeds because the update system is configured with UpdateDriver::setFromOplogApplication():

https://github.com/mongodb/mongo/blob/84916e817418b3b5627e80730effcd422c15696e/src/mongo/db/ops/parsed_update.cpp#L146

This causes the $set implementation to suppress any error if the path to create would traverse through an existing scalar:

https://github.com/mongodb/mongo/blob/84916e817418b3b5627e80730effcd422c15696e/src/mongo/db/update/modifier_node.cpp#L224-L238

This seems like the correct behavior for oplog application idempotency. That is, it is correct for $set to fail with ErrorCodes::NonViablePath in this case, unless the $set is being done inside oplog application, in which case the error should be suppressed to ensure idempotency.

Hopefully this answers any questions that the repl team had for the query team! I did not investigate why exactly the behavior has changed between 3.2, 3.4, 3.6, and 4.0. However, it's clear that the behavior changes were due to how applyOps uses the update subsystem in oplog application, as opposed to behavior changes in the regular update implementation. I'm moving this into the repl team's triage queue. Don't hesitate to let me know if there's anything else I can help with!

Comment by Craig Homa [ 06/Feb/19 ]

Hey david.storch, does Siyuan's comment answer your questions?

Comment by Siyuan Zhou [ 09/Jan/19 ]

david.storch, right, this is a dup of SERVER-38747. I think the question is more about which behavior makes the most sense to Query team than where the behavioral change was introduced, unless the latter helps us understand the former. Since there's no idempotency issue, feel free to close this ticket if query team thinks they are all valid behavior.

My understanding is the behavioral inconsistency won't lead to data inconsistency in a mixed-version replica set either.

Comment by David Storch [ 09/Jan/19 ]

The following update results in an error on 3.4, 3.6, 4.0, and master:

db.c.drop()
db.c.insert({a: null})
db.c.update({}, {$set: {"a.0": 2}})

Therefore, it appears that the changing behavior of applyOps is not due to the update subsystem, but rather is specific to the applyOps code path.

Comment by David Storch [ 09/Jan/19 ]

tess.avitabile siyuan.zhou shane.harvey, how does this ticket differ from SERVER-38747?

Comment by Siyuan Zhou [ 05/Jan/19 ]

CC tess.avitabile, as the behavior might be caused by the array update project and it's relevant to replication.

Generated at Thu Feb 08 04:50:16 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.