[SERVER-48428] createIndexes should wait for user provided write concern even when returning failure Created: 27/May/20 Updated: 29/Oct/23 Resolved: 23/Jul/20 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Index Maintenance |
| Affects Version/s: | None |
| Fix Version/s: | 4.7.0 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Justin Seyster | Assignee: | Eric Milkie |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||||||
| Backport Requested: |
v4.4
|
||||||||||||||||||||||||
| Sprint: | Execution Team 2020-06-15, Execution Team 2020-06-29, Execution Team 2020-07-13, Execution Team 2020-07-27 | ||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||
| Linked BF Score: | 10 | ||||||||||||||||||||||||
| Description |
|
When an index build fails (for example, because the index is a "2dsphere" index and the collection contains invalid geometry), it can return the failure before the startIndexBuilds oplog entry is majority committed. A stepdown at just the right moment can make it appear that the index is still being built (according to the listIndexes command), even though the client has already observed the failure code. Waiting for majority commit in the failure case should stop this from happening. |
| Comments |
| Comment by Benety Goh [ 16/Sep/20 ] | ||
|
This change is not safe to back port to 4.4 without further modification because it relies on | ||
| Comment by Githook User [ 16/Sep/20 ] | ||
|
Author: {'name': 'Benety Goh', 'email': 'benety@mongodb.com', 'username': 'benety'}Message: Revert " This reverts commit 81d5259fb84897191e01114f623050c29b760523. | ||
| Comment by Githook User [ 09/Sep/20 ] | ||
|
Author: {'name': 'Eric Milkie', 'email': 'milkie@10gen.com', 'username': 'milkie'}Message: (cherry picked from commit 475fcc6ebd04767fbbbe3231c516e70b9be7d90e) | ||
| Comment by Githook User [ 23/Jul/20 ] | ||
|
Author: {'name': 'Eric Milkie', 'email': 'milkie@10gen.com', 'username': 'milkie'}Message: | ||
| Comment by Eric Milkie [ 16/Jul/20 ] | ||
|
In the BF, it appears that waiting for write concern got interrupted by the stepdown itself and thus didn't wait long enough:
I'm not sure if there will be a way to avoid having the write concern wait be interrupted by stepdown. | ||
| Comment by Eric Milkie [ 28/May/20 ] | ||
|
After further investigation, we do need to do this work to solve the build failure. | ||
| Comment by Justin Seyster [ 27/May/20 ] | ||
|
daniel.gottlieb Good question! I checked the logs, and unfortunately, I can't find any entries that include the operationTime for the startIndexBuild oplog entry. I can't reproduce this failure scenario, unfortunately, so I don't have any other way to get that data. I think the next step is to build a test case that uses fail points to force the interleaving I described, so we can verify it's actually possible and have a way to test any potential fixes. | ||
| Comment by Daniel Gottlieb (Inactive) [ 27/May/20 ] | ||
|
Curious, in this failure case, what's the relationship to the index build failure response's operationTime with the startIndexBuild oplog entry? Do they match? Does the failure response have an earlier value? |