[SERVER-54150] Recovery from a stable checkpoint should fassert on oplog application failures Created: 29/Jan/21 Updated: 29/Oct/23 Resolved: 26/Apr/23 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | 7.1.0-rc0, 6.0.6, 5.0.18, 7.0.0-rc1 |
| Type: | Task | Priority: | Major - P3 |
| Reporter: | Lingzhi Deng | Assignee: | Moustafa Maher |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | repl-shortlist | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||||||||||||||
| Assigned Teams: |
Replication
|
||||||||||||||||||||||||||||||||||||
| Backwards Compatibility: | Minor Change | ||||||||||||||||||||||||||||||||||||
| Backport Requested: |
v7.0, v6.3, v6.2, v6.0, v5.0
|
||||||||||||||||||||||||||||||||||||
| Sprint: | Repl 2023-03-06, Repl 2023-03-20, Repl 2023-04-03, Repl 2023-05-01 | ||||||||||||||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||||||||||||||
| Linked BF Score: | 135 | ||||||||||||||||||||||||||||||||||||
| Description |
|
Currently, we ignore certain errors when applying oplog in Mode::kRecovering for idempotency (e.g. this). This makes sense for initial sync, eMRC=false and rollback via refetch. But if we are recovering from a stable checkpoint, oplog application should be able to finish without any errors. And we should fassert on oplog application errors like we do in secondary oplog application. Skip fassert on oplog application failures in selective restore process:
|
| Comments |
| Comment by Githook User [ 27/Apr/23 ] |
|
Author: {'name': 'Moustafa Maher Khalil', 'email': 'm.maher@mongodb.com', 'username': 'moustafamaher'}Message: |
| Comment by Githook User [ 26/Apr/23 ] |
|
Author: {'name': 'Moustafa Maher Khalil', 'email': 'm.maher@mongodb.com', 'username': 'moustafamaher'}Message: |
| Comment by Githook User [ 26/Apr/23 ] |
|
Author: {'name': 'Moustafa Maher Khalil', 'email': 'm.maher@mongodb.com', 'username': 'moustafamaher'}Message: |
| Comment by Githook User [ 26/Apr/23 ] |
|
Author: {'name': 'Moustafa Maher Khalil', 'email': 'm.maher@mongodb.com', 'username': 'moustafamaher'}Message: |
| Comment by Githook User [ 12/Apr/23 ] |
|
Author: {'name': 'Moustafa Maher Khalil', 'email': 'm.maher@mongodb.com', 'username': 'moustafamaher'}Message: Revert " This reverts commit d8d5582fd381ed87b8463782747399a6c1965892. |
| Comment by Moustafa Maher [ 12/Apr/23 ] |
|
We need to investigate this error before committing this change. |
| Comment by Githook User [ 12/Apr/23 ] |
|
Author: {'name': 'Moustafa Maher Khalil', 'email': 'm.maher@mongodb.com', 'username': 'moustafamaher'}Message: Revert " This reverts commit d8d5582fd381ed87b8463782747399a6c1965892. |
| Comment by Githook User [ 12/Apr/23 ] |
|
Author: {'name': 'Moustafa Maher Khalil', 'email': 'm.maher@mongodb.com', 'username': 'moustafamaher'}Message: Revert " This reverts commit 7822344e72464810f6614d3491b86c7d0971b1bd. |
| Comment by Githook User [ 12/Apr/23 ] |
|
Author: {'name': 'Moustafa Maher Khalil', 'email': 'm.maher@mongodb.com', 'username': 'moustafamaher'}Message: Revert " This reverts commit 347f059c439dfafe9e8a34365b4c5e7a17c22acf. |
| Comment by Githook User [ 10/Apr/23 ] |
|
Author: {'name': 'Moustafa Maher Khalil', 'email': 'm.maher@mongodb.com', 'username': 'moustafamaher'}Message: (cherry picked from commit d8d5582fd381ed87b8463782747399a6c1965892) |
| Comment by Githook User [ 10/Apr/23 ] |
|
Author: {'name': 'Moustafa Maher Khalil', 'email': 'm.maher@mongodb.com', 'username': 'moustafamaher'}Message: (cherry picked from commit d8d5582fd381ed87b8463782747399a6c1965892) |
| Comment by Githook User [ 28/Mar/23 ] |
|
Author: {'name': 'Moustafa Maher Khalil', 'email': 'm.maher@mongodb.com', 'username': 'moustafamaher'}Message: |
| Comment by Opal Hoyt [ 06/Feb/23 ] |
|
Consider how far this can be backported |
| Comment by Judah Schvimer [ 02/Feb/23 ] |
|
While we should fassert in testing, we might want to be careful and first introduce this as a log message in production and change it to an fassert in production after some confidence that we truly do not need to ignore these errors in any cases. See |
| Comment by Lingzhi Deng [ 02/Feb/23 ] |
|
Another example is we ignore NamespaceNotFound error for CRUD during startup recovery. |
| Comment by Judah Schvimer [ 02/Feb/23 ] |
|
We should reconsider this as an extra safeguard against data corruption. |