[SERVER-30276] Secondary crashes after querying a unique index containing duplicates Created: 24/Jul/17 Updated: 27/Oct/23 Resolved: 16/Sep/19 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Index Maintenance, Replication |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Dmitry Ryabtsev | Assignee: | Backlog - Storage Execution Team |
| Resolution: | Works as Designed | Votes: | 3 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Assigned Teams: |
Storage Execution
|
||||||||
| Operating System: | ALL | ||||||||
| Participants: | |||||||||
| Case: | (copied to CRM) | ||||||||
| Description |
|
Our general recommendation for replica sets is to build indexes in a rolling fashion. To the best of my knowledge, the same approach is used when Automation creates the indexes in a replica set. It is absolutely appropriate in case with regular indexes, however there is a problem if the index that needs to be created is unique. I tested different versions of MongoDB and found that behaviour has changed over the years. The behaviour of the most recent release (3.4) still seems problematic to me - I'll explain below. This is my test:
2.6.12:
3.0.14 (MMAPv1, WT):
3.2.14 (MMAPv1):
3.2.14 (WT):
3.4.4 (MMAPv1):
3.4.4 (WT):
I can understand how the behaviour of v2.6 was unwanted - from the Primary/client's logic the Secondaries should not crash, since the client is performing legitimate actions. Having said that, the behaviour of v3.0+ MMAPv1 is more troublesome as unique constraint is not in effect and duplicate entries are getting added into the unique index. Needless to say, it must be a defect that WT and MMAPv1 do not behave in the same way (WT aborts, but MMAPv1 does not). The versions v3.2 and 3.4 are more strict, but what it allows is that during the rolling index builds (note that is the procedure we recommend officially in our docs!) there is a window of opportunity that allows duplicates to be written on the primary, while it doesn't have a unique index yet, and then replicated onto the secondaries with the unique indexes. Those secondaries are now timing bombs - they will crash as soon as the duplicate docs are queried via unique index. That can happen 2 minutes after the unique index is created, or 5 years down the road. Of course the user will be able to figure out there is a problem when they try to build the unique index on the last replica set member (the index build should fail due to the presence of the duplicates). But there is also a possibility that that member can get decommissioned without having the index built, and, with the remaining (rigged) replica set members running. I'm not sure what's the best solution here. Ideally the secondary should not allow new entries to be added to a unique index, but having it crashed (as with 2.6) is not good either. Regardless of the solution, we need to ensure that the outcome is the same no matter which storage engine is used (MMAPv1 or WT). |
| Comments |
| Comment by Brian Lane [ 16/Sep/19 ] |
|
We have improved repair that will help resolve this if it does appear in production. Closing this issue as works as designed. |
| Comment by Eric Milkie [ 27/Jul/17 ] |
|
This behavior should be documented; I'll start the process for that. |