[SERVER-48974] Index build can crash with CappedPositionLost error Created: 18/Jun/20 Updated: 06/Dec/22 Resolved: 03/May/21 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Index Maintenance |
| Affects Version/s: | 4.5.1, 4.4.0-rc10 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Louis Williams | Assignee: | Backlog - Storage Execution Team |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||
| Assigned Teams: |
Storage Execution
|
||||||||||||
| Operating System: | ALL | ||||||||||||
| Sprint: | Execution Team 2021-07-26 | ||||||||||||
| Participants: | |||||||||||||
| Linked BF Score: | 18 | ||||||||||||
| Description |
|
Update: This was also observed in If an index build collection scan recovers from yielding and can't restore its cursor because the saved position was deleted, then an index build will crash at this invariant with a CappedPositionLost error. Example:
It wouldn't be a complete solution to just abort the index build, because a secondary could hit this error independently of a primary and still crash. I think we can safely restart the collection scan if we hit a CappedPositionLost error. While this poses a liveness issue, I think the circumstances of hitting this bug are extreme enough to warrant this solution. |
| Comments |
| Comment by Gregory Wlodarek [ 03/May/21 ] |
|
Marking this as a duplicate of |
| Comment by Louis Williams [ 18/Jun/20 ] |
|
Assuming we agree on the solution, I think this would involve moving this call to initiateBulk inside insertAllDocumentsInCollection and then wrap that in a retry if we hit a CappedPositionLost exception. |