[SERVER-30705] Concurrent updates and FCV change can cause dbhash mismatch between primary and secondary Created: 16/Aug/17 Updated: 30/Oct/23 Resolved: 14/Sep/17 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Querying, Write Ops |
| Affects Version/s: | None |
| Fix Version/s: | 3.6.0-rc0 |
| Type: | Bug | Priority: | Critical - P2 |
| Reporter: | David Storch | Assignee: | Justin Seyster |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||
| Operating System: | ALL | ||||||||
| Sprint: | Query 2017-08-21, Query 2017-09-11, Query 2017-10-02 | ||||||||
| Participants: | |||||||||
| Description |
|
3.5.x versions of the server have two implementations of the update subsystem: the "old" (3.4 and earlier) system, and the new system in src/mongo/db/update which is both more performant and supports more expressive array updates. The old and new systems have different behavior with respect to field ordering. In order to ensure that the field ordering is consistent across all nodes in the replica set, the primary and secondaries must use the same version of the update subsystem. The is achieved via the feature compatibility version mechanism. Users must set the feature compatibility version (FCV) to "3.6" in order to enable the new update system. The FCV check, however, does not guarantee that a given update uses the same version of the update code on every node. Consider the following sequence of events:
I was able to reproduce a dbhash mismatch against a two-node 3.5.x replica set by running two scripts concurrently from two shells connected to the primary node. The first script repeatedly issues an update with two $set's, that will result in different field ordering depending on which version of the update implementation is used:
The second script repeatedly sets the FCV from "3.4" to "3.6" and back again:
After the first script completes, running the dbHash command against the test database on each node should show different hashes for test.c. |
| Comments |
| Comment by Ramon Fernandez Marina [ 14/Sep/17 ] |
|
Author: {'username': u'jseyster', 'name': u'Justin Seyster', 'email': u'justin.seyster@mongodb.com'}Message: With the new UpdateNodes class hierarchy, there are two code paths for When an update executes as part of the application of an oplog entry, There are two other places where we need this behavior: Both these code paths set the fromOplogApplication flag, which |
| Comment by Tess Avitabile (Inactive) [ 18/Aug/17 ] |
|
I think it is unlikely we will do SERVER-5030 and make the query language order-independent for 3.6 (though it is something to consider for future work on query language semantics), so I would be in favor of fixing this and |
| Comment by Spencer Brody (Inactive) [ 17/Aug/17 ] |
|
This brings up the bigger question about whether we consider field ordering a meaningful property of a document that we want to ensure stays consistent across replica set members. This came up recently in |
| Comment by David Storch [ 17/Aug/17 ] |
|
Per our in-person discussion today, we plan to pursue tess.avitabile's idea for how to fix this, since it is much simpler to implement. schwerin, we can definitely just throw the old version out once we branch for 3.8. |
| Comment by Andy Schwerin [ 16/Aug/17 ] |
|
If we do as tess.avitabile proposes first, and we have a performance problem, we can address it in a point release. If there is no problem, we can decide what to do in 3.8 separately. Perhaps the new update system will need an order-preserving mode, or perhaps we can just throw the old version out in 3.8. |
| Comment by Tess Avitabile (Inactive) [ 16/Aug/17 ] |
|
An alternative is that secondaries always use the old system, which creates new fields in the order specified by the primary. The advantage is that this requires no changes to the oplog format, and the disadvantage is that secondaries do not get any perf improvement for updates. I'm not sure which solution is better. I don't have much worked scheduled for next sprint--I think I'll be working on expressive lookup. |
| Comment by David Storch [ 16/Aug/17 ] |
|
tess.avitabile, after discussing with Andy, I think the problem here is that secondaries should never rely on FCV checks. Instead, the primary should explicitly log which update system it used in the oplog. The secondary should interpret this information and select the appropriate code path. This is akin to how we require primaries to explicitly include the index version in createIndex oplog entries. I think this needs to be addressed in 3.5. Do you or justin.seyster have time to take it this or next sprint? |