[SERVER-27122] Restart initial sync for known index idempotency errors; fail for unknown ones. Created: 18/Nov/16  Updated: 06/Dec/22

Status: Backlog
Project: Core Server
Component/s: Index Maintenance, Replication
Affects Version/s: 3.4.0-rc4
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Robert Guo (Inactive) Assignee: Backlog - Replication Team
Resolution: Unresolved Votes: 0
Labels: former-quick-wins, former-robust-initial-sync, idempotency, initialSync
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
is depended on by SERVER-33946 Decrease number of initial sync attem... Blocked
Duplicate
is duplicated by SERVER-33057 Indexing fatal assertion on secondary... Closed
Related
related to SERVER-27357 Recreating collection with different ... Backlog
related to SERVER-32225 Initial sync should ignore multiple t... Closed
related to SERVER-44462 Run listIndexes with a single snapsho... Closed
Assigned Teams:
Replication
Participants:
Case:
Linked BF Score: 0

 Description   

For each known index idempotency problem, return a unique error to initial sync so that it can discern them and restart initial sync without failing tests. This will permit initial sync to detect unknown idempotency problems and fail tests rather than silently hide the problems.

ORIGINAL DESCRIPTION:
There are a number of additional index operations that should be idempotent and currently not handled by the improvements in SERVER-26202.

Applying the following operations on a collection before and after dropping it will cause initial sync to restart. They were found by going through the various checks in IndexCatalog::_doesSpecConflictWithExisting

1. creating indexes with different specs but the same name.
2. creating text indexes with different specs.
3. having more than 64 indexes combined, before and after dropping a collection.



 Comments   
Comment by Judah Schvimer [ 06/Nov/19 ]

This should unblock SERVER-33946, and then the Replication Team can retriage that ticket to catch unknown idempotency problems.

Comment by Eric Milkie [ 06/Nov/19 ]

After discussion, we decided to change the work for this ticket: instead of fixing the idempotency problems to work seamlessly, we can instead make each idempotency error uniquely identifiable. That way, initial sync can filter out the known problems and restart without failing the test, and any other unrecognized errors can fail the test.

Comment by Judah Schvimer [ 06/Nov/19 ]

Without SERVER-33946 it is difficult to ensure we've handled all idempotency problems in initial sync. Can you please clarify which ones would be too difficult to fix? Maybe we can ensure the other ones don't happen in our tests instead of fixing them to still enable SERVER-33946.

Comment by Eric Milkie [ 06/Nov/19 ]

The amount of work required to do this doesn't seem like a good tradeoff for the perceived benefit, so I am recommending we close this as Won't Fix.

Comment by Siyuan Zhou [ 10/Oct/19 ]

I feel this needs some semantic change of index build here in those problematic idempotency issues. Assigning to Execution team. Happy to discuss proposals though.

Comment by Judah Schvimer [ 05/Nov/18 ]

Yes, this is a result of applying operations and index creations/deletions out of order during initial sync due to the clone and then oplog-replay process. It's possible, even with UUIDs, for documents and their indexes to temporarily be incompatible, or for multiple indexes to be incompatible temporarily, such that the incompatibilities would be resolved by the end of initial sync if initial sync were allowed to finish.

Comment by Asya Kamsky [ 03/Nov/18 ]

Is this an issue still after the UUID work for collections? 

Generated at Thu Feb 08 04:14:13 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.