[SERVER-30470] Update subsystem doesn't preserve field-ordering on re-application of oplog entries Created: 01/Aug/17  Updated: 27/Oct/23  Resolved: 21/Sep/20

Status: Closed
Project: Core Server
Component/s: Replication, Write Ops
Affects Version/s: 3.0.15, 3.2.17, 3.4.9, 3.5.13
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: May Hoque Assignee: Ian Boros
Resolution: Gone away Votes: 0
Labels: query-44-grooming, storch
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Related
related to SERVER-50300 Log $v:2 update oplog entries from th... Closed
Operating System: ALL
Steps To Reproduce:

Apply the following patch and run the sync_tail_test C++ unit test.

diff --git a/src/mongo/db/repl/sync_tail_test.cpp b/src/mongo/db/repl/sync_tail_test.cpp
index 2498a0cd9b..809314fec1 100644
--- a/src/mongo/db/repl/sync_tail_test.cpp
+++ b/src/mongo/db/repl/sync_tail_test.cpp
@@ -1425,6 +1425,22 @@ TEST_F(IdempotencyTest, ResyncOnRenameCollection) {
     ASSERT_EQUALS(runOp(op), ErrorCodes::OplogOperationUnsupported);
 }
 
+TEST_F(IdempotencyTest, FieldOrderingOnUpdateIsPreserved) {
+    ASSERT_OK(
+        ReplicationCoordinator::get(_opCtx.get())->setFollowerMode(MemberState::RS_RECOVERING));
+    ASSERT_OK(runOp(createCollection()));
+
+    auto insertOp = insert(fromjson("{ a: 0, _id: 1 }"));
+    ASSERT_OK(runOp(insertOp));
+
+    auto updateOp1 = update(1, fromjson("{ $unset: { a: null } }”));
+    auto updateOp2 = update(1, fromjson("{ $set: { 'a.c': 1, b: null } }"));
+    
+    auto ops = {updateOp1, updateOp2};
+    testOpsAreIdempotent(ops);
+}
+
 }  // namespace
 }  // namespace repl
 }  // namespace mongo

Sprint: Query 2020-10-05
Participants:

 Description   

The Oplog Application Idempotency project found the following bug:

Starting with a document { _id: 1, a: 0 };
 
Apply: o: { $unset: { a: null } }
==> { _id: 1 };
Apply: o: { $set: { 'a.c': 1, b: null } }
==> { _id: 1, a: { c: 1 }, b: null };
 
// Applying the updates again:
Apply: o: { $unset: { a: null } }
==> { _id: 1, b: null };
Apply: o: { $set: { 'a.c': 1, b: null } }
==> { _id: 1, b: null, a: { c: 1 } };

Impact

Initial sync, replication rollback, and replication recovery following an unclean shutdown may apply a sequence of operations twice and therefore depend on oplog application being idempotent. It is possible that after applying a sequence of operations twice for a document to contain the same field-value pairs, but for the fields to be order differently within the document. While MongoDB replication subsystem has never promised to preserve the order of fields across documents on the primary and secondary, not doing so is problematic for two reasons:

  1. The "dbhash" command takes the md5sum of the contents of each collection. Having a different field ordering between the primary and secondary will lead to a dbhash mismatch. There was a similar issue with the $rename update modifier that was addressed in SERVER-21647.
  2. Queries of the form {a: {x: 1, y: 2}} will not match the document {_id: 0, a: {y: 2, x: 1}} (see SERVER-5030), so it is possible for the query when run a primary to return the document and for the query when run on a secondary to not return the document.


 Comments   
Comment by Ian Boros [ 21/Sep/20 ]

Starting in version 4.8, modifier-style updates will be logged in the $v:2 "delta" format (SERVER-50300). The $v:2 format is not affected by the issue described here, so I am closing this ticket as "Gone Away."

Note that $v:2 entries may not be used when running with an older FCV.

Comment by Tess Avitabile (Inactive) [ 08/Dec/17 ]

Taking out of the sprint. We should reschedule this when featureCompatibilityVersion 3.8 is available.

Comment by Spencer Brody (Inactive) [ 05/Dec/17 ]

I don't feel super strongly on the two proposed oplog schema schema changes, but like David I slightly prefer the $create approach because of its greater similarity to the existing format. If we took the array-based approach, we'd have to find every place that passes an 'o' update oplog entry into the update system and change it to unpack the array and perform a series of updates instead.

That said there probably aren't many places that do that, so I could be swayed if we think the array-based is sufficiently more future-proof.

Comment by David Storch [ 04/Dec/17 ]

tess.avitabile, this design doesn't seem materially different to me, other than that the syntax written down in the oplog is different. If I understand correctly, the work that the update has to do on the secondary is essentially the same. I slightly prefer $create only in that it seems a bit more consistent with today's update language, and therefore implementing it might actually be a bit easier. However, we should probably have an offline discussion (maybe even with the replication team?) about which design we prefer.

Comment by Tess Avitabile (Inactive) [ 30/Nov/17 ]

I have a new design proposal for this issue:

  • Allow the o field in an oplog entry to be an array of updates.
  • Let the normal oplog entry for an update be onormal. Define a document ounset ={$unset: {p: true, ...}}, containing every newly created path p in the updated document that is not prefixed by another newly created path. When the FCV is 3.8, use o=[ounset,onormal]. When the FCV is 3.6, use o=onormal.

The behavior of oplog application with this design is identical to using $create. I prefer this solution because it will lay the foundation if we ever want to allow an array of updates in the update command, it avoids adding a new modifier, and I think it will be easier to build in our current system.

Either design depends on bumping the FCV to 3.8.

Comment by David Storch [ 29/Sep/17 ]

To clarify, we believe that this is a "beginning of time" issue that affects all stable versions of MongoDB. The substantial changes to the update subsystem done as part of SERVER-27089 for version 3.6 do not have any impact on this problem. This is because, as Tess mentioned above, we would require an addition to the update language such as $create in order to describe $set operations in the oplog in a fully idempotent fashion. Therefore, this problem is in a sense just a consequence of a lack of sufficient expressivity in the update language.

Comment by Tess Avitabile (Inactive) [ 14/Aug/17 ]

Great find, may.hoque. Just to make it apparent that this does not depend on nested documents, the following set of operations also demonstrates the problem:

Starting with a document { _id: 1, a: 0 };
 
Apply: o: { $unset: { a: true } }
==> { _id: 1 };
Apply: o: { $set: { a: 1, b: 1 } }
==> { _id: 1, a: 1, b: 1 };
 
// Applying the updates again:
Apply: o: { $unset: { a: 1 } }
==> { _id: 1, b: 1 };
Apply: o: { $set: { a: 1, b: 1 } }
==> { _id: 1, b: 1, a: 1 };

I cannot think of a way to fix this issue with the current update language, since $set is used both to modify existing fields and to create new fields. We could implement an update modifier $create (and even only allow it in updates from replication) that either creates a field or unsets and creates it if it already exists. Then every time a new field is created by $set, we use $create in the oplog instead of $set. It would take a little more thinking, but I believe this would fix the issue.

However, SERVER-5030 may be a better solution. It is worth noting that in 3.6, a user can no longer determine the order in which $set adds new fields--they are always added in lexicographic order. That seems like a move in the direction that the fields in documents are unordered.

Comment by Max Hirschhorn [ 01/Aug/17 ]

david.storch, spencer, I thought it made sense for the Replication team to triage this issue given how it may impact what we want to promise about preserving field-ordering across the primary and secondary. I suspect the fix itself is something that would be implemented by the Query team though.

Generated at Thu Feb 08 04:23:56 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.