[SERVER-78813] Commit point propagation fails indefinitely with exhaust cursors with null lastCommitted optime Created: 10/Jul/23 Updated: 19/Jan/24 Resolved: 07/Aug/23 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | 7.1.0-rc0, 7.0.1, 5.0.20, 6.0.9, 4.4.25 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Lingzhi Deng | Assignee: | Lingzhi Deng |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | repl-shortlist | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||||||||||||||
| Backport Requested: |
v7.0, v6.0, v5.0, v4.4
|
||||||||||||||||||||||||||||||||
| Sprint: | Repl 2023-07-24 | ||||||||||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||||||||||
| Case: | (copied to CRM) | ||||||||||||||||||||||||||||||||
| Linked BF Score: | 70 | ||||||||||||||||||||||||||||||||
| Description |
|
If a (syncing) node has a null lastCommitted optime when it issues its exhaust getMore request to its sync source, it will not receive empty oplog branches for commit point propagation forever. This is because we don't take into account or update the last known lastCommitted optime of an exhaust cursor if it's originally null (see here and here). Normally, a syncing node would get its sync source's lastCommitted optime from the first oplog find request. But if the sync source node also had a null lastCommitted optime when the syncing node started to sync from it, then I think we could end up in the situation I mentioned above. The elected primary normally relies on the JournalFlusher to trigger the first lastCommitted calculation/update. And this could be delayed because the JournalFlusher is run asynchronously. This could happen after replSetInitiate on 4.4 (see |
| Comments |
| Comment by Githook User [ 22/Aug/23 ] |
|
Author: {'name': 'Lingzhi Deng', 'email': 'lingzhi.deng@mongodb.com', 'username': 'ldennis'}Message: (cherry picked from commit 12f70e5fdff32cd733d1fde7a651bdfbae389e8b) |
| Comment by Githook User [ 22/Aug/23 ] |
|
Author: {'name': 'Lingzhi Deng', 'email': 'lingzhi.deng@mongodb.com', 'username': 'ldennis'}Message: (cherry picked from commit 12f70e5fdff32cd733d1fde7a651bdfbae389e8b) |
| Comment by Githook User [ 15/Aug/23 ] |
|
Author: {'name': 'Lingzhi Deng', 'email': 'lingzhi.deng@mongodb.com', 'username': 'ldennis'}Message: (cherry picked from commit 12f70e5fdff32cd733d1fde7a651bdfbae389e8b) |
| Comment by Githook User [ 07/Aug/23 ] |
|
Author: {'name': 'Lingzhi Deng', 'email': 'lingzhi.deng@mongodb.com', 'username': 'ldennis'}Message: Revert " This reverts commit 2f36b8afb61df115001b5c7b201d98a4a227fca4. |
| Comment by Githook User [ 31/Jul/23 ] |
|
Author: {'name': 'Lingzhi Deng', 'email': 'lingzhi.deng@mongodb.com', 'username': 'ldennis'}Message: (cherry picked from commit 12f70e5fdff32cd733d1fde7a651bdfbae389e8b) |
| Comment by Githook User [ 31/Jul/23 ] |
|
Author: {'name': 'Lingzhi Deng', 'email': 'lingzhi.deng@mongodb.com', 'username': 'ldennis'}Message: (cherry picked from commit 12f70e5fdff32cd733d1fde7a651bdfbae389e8b) |
| Comment by Githook User [ 19/Jul/23 ] |
|
Author: {'name': 'Lingzhi Deng', 'email': 'lingzhi.deng@mongodb.com', 'username': 'ldennis'}Message: |
| Comment by Lingzhi Deng [ 13/Jul/23 ] |
|
After a second look at this code, I found that it is actually possible for the sync source node to reset the exhaust cursor's lastKnownCommittedOpTime back to null if the sync source's lastCommittdeOpTime is null. This means that even if the syncing node sends a non-null lastKnownCommittedOpTime to begin with, there is still chances for the sync source node to mistakenly set it to null, after which the commit point propagation (via empty batches) between the two nodes is terminated. |
| Comment by Lingzhi Deng [ 10/Jul/23 ] |
|
One easy solution is to consider a null optime as the smallest lastCommitted optime and always update the last known committed optime for oplog exhaust cursors to be the commit point returned in the last batch. I think we checked for null initially only to differentiate external oplog queries from internal oplog fetching queries, so that we don't opt into commit propagation unnecessarily for external oplog queries in the absence of the last known committed optime. But I think we can make that differentiation based on present of the metadata "$replData" / "$oplogQueryData". |