[SERVER-32903] Ambiguous field name error should be ignored during initial sync Created: 25/Jan/18 Updated: 30/Oct/23 Resolved: 29/Jan/20 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Index Maintenance, Replication |
| Affects Version/s: | None |
| Fix Version/s: | 4.2.4, 3.6.18, 4.3.4, 4.0.18 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Charlie Swanson | Assignee: | Ryan Timmons |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | former-quick-wins, former-robust-initial-sync, initialSync | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||||||||||||||||||
| Issue Links: |
|
||||||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||||||
| Backport Requested: |
v4.2, v4.0, v3.6
|
||||||||||||||||||||||||
| Sprint: | Repl 2020-01-27, Repl 2020-02-10 | ||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||
| Linked BF Score: | 6 | ||||||||||||||||||||||||
| Description |
|
The following error can happen when inserting a document like
into a collection with an index on {"a.0": 1}:
When finishing initial sync and applying oplog entries that happened during the sync, this error should be ignored. Like a unique index constraint violation, this should eventually be resolved during oplog application, either by a another operation deleting the document, or by another oplog entry dropping the index. |
| Comments |
| Comment by Githook User [ 02/Apr/20 ] | ||||||||||||||||||
|
Author: {'name': 'Ryan Timmons', 'email': 'ryan.timmons@10gen.com', 'username': 'rtimmons'}Message: create mode 100644 jstests/replsets/initial_sync_ambiguous_index.js (cherry picked from commit 3423ca586b88566857f3fcdfeca1c6fdee7a0911) | ||||||||||||||||||
| Comment by Githook User [ 02/Apr/20 ] | ||||||||||||||||||
|
Author: {'name': 'Ryan Timmons', 'email': 'ryan.timmons@10gen.com', 'username': 'rtimmons'}Message: create mode 100644 jstests/replsets/initial_sync_ambiguous_index.js (cherry picked from commit 3423ca586b88566857f3fcdfeca1c6fdee7a0911) | ||||||||||||||||||
| Comment by Githook User [ 21/Feb/20 ] | ||||||||||||||||||
|
Author: {'name': 'Ryan Timmons', 'username': 'rtimmons', 'email': 'ryan.timmons@10gen.com'}Message: | ||||||||||||||||||
| Comment by Githook User [ 06/Feb/20 ] | ||||||||||||||||||
|
Author: {'username': 'rtimmons', 'name': 'Ryan Timmons', 'email': 'ryan.timmons@10gen.com'}Message: create mode 100644 jstests/replsets/initial_sync_ambiguous_index.js create mode 100644 jstests/replsets/initial_sync_ambiguous_index.js | ||||||||||||||||||
| Comment by William Schultz (Inactive) [ 06/Feb/20 ] | ||||||||||||||||||
|
If they apply cleanly they shouldn't require CRs. If they become more involved and require some manual intervention then a CR might be necessary. I expect these backports will be clean. | ||||||||||||||||||
| Comment by William Schultz (Inactive) [ 31/Jan/20 ] | ||||||||||||||||||
|
ryan.timmons I think we should backport this fix to older versions. Also, since the original bug is not yet fixed on 4.2, the new test is failing in the multiversion suites i.e. BFG-432604. We can disable the test on master until it is backported by adding an exclusion here. Feel free to post that change as a follow up commit on this ticket. | ||||||||||||||||||
| Comment by William Schultz (Inactive) [ 31/Jan/20 ] | ||||||||||||||||||
|
With some manual knob tuning, I was eventually able to get the initial sync fuzzer to reproduce bug (1) described above i.e. an error during oplog application. This is the error message:
Here is the repro (initsync_fuzzer-31f9-1580446991037-9.js
and the createIndex command:
and I updated the following knob constants:
I also manually disabled generation of transactions commands since this was not a transactions related bug. The failure reproduced on the third execution of a patch build that generated 5 tasks each with 10 fuzzer tests each. So, it appears it took around 150 generated fuzzer tests before we hit the bug. I think this demonstrates to a certain extent that the fuzzer can act as a tool to help search for deterministic repros of a hypothesized bug with less manual effort. We can augment the grammar with some of the interesting operations that we think are needed to find the bug and let the fuzzer run many generated test cases. | ||||||||||||||||||
| Comment by Githook User [ 29/Jan/20 ] | ||||||||||||||||||
|
Author: {'username': 'rtimmons', 'name': 'Ryan Timmons', 'email': 'ryan.timmons@10gen.com'}Message: create mode 100644 jstests/replsets/initial_sync_ambiguous_index.js | ||||||||||||||||||
| Comment by William Schultz (Inactive) [ 23/Jan/20 ] | ||||||||||||||||||
|
Discussed two hypothetical scenarios where this bug could manifest. It's difficult to verify them against existing failures since log files have been reaped. It does appear that in certain BFGs the error is in oplog application and in others it appears in index building. (1) We fail insertion during initial sync oplog application. If the sync source does an insert, delete, and then createIndex, and then we start the collection clone, building the problematic index, during oplog application we would to re-apply the insert which would throw the "ambiguous field name" error. (2) We fail index building during initial sync collection cloning. If we clone a collection with the problematic index on it, and then the index is dropped and an insert occurs in the sync source during collection cloning, we might try to clone the problematic document and insert it while we are building the problematic index. If this is a real bug I am also curious if this is something the initial sync fuzzer would be able to catch and/or why it is not finding it. | ||||||||||||||||||
| Comment by Ryan Timmons [ 22/Jan/20 ] | ||||||||||||||||||
|
Summary:
I'm going to reach out to other members on repl to figure out if simply adding this error status-code to the whitelist per above is really the best solution. And if so, how do we avoid failing index-validations. | ||||||||||||||||||
| Comment by Charlie Swanson [ 25/Jan/18 ] | ||||||||||||||||||
|
Instead of adding 16746 to the whitelist, we could also change that uassert to use ErrorCodes::CannotBuildIndexKeys, which is already ignored. |