[SERVER-39310] Check canServeReadsFor in getMores Created: 31/Jan/19 Updated: 29/Oct/23 Resolved: 30/Mar/19 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Querying, Replication |
| Affects Version/s: | None |
| Fix Version/s: | 3.6.15, 4.1.10, 4.0.13 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Judah Schvimer | Assignee: | Lingzhi Deng |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||
| Operating System: | ALL | ||||||||
| Backport Requested: |
v4.0, v3.6, v3.4
|
||||||||
| Sprint: | Repl 2019-03-25, Repl 2019-04-08 | ||||||||
| Participants: | |||||||||
| Linked BF Score: | 6 | ||||||||
| Description |
|
The "getMore" command path appears to never call canServeReadsFor, meaning that we could end up serving reads during a rollback. Data in this time could be inconsistent. For rollback via refetch, we refetch documents from the future and then play updates on them from the past which can lead to documents that never existed being returned. Rollback to a Timestamp is likely safe since it never is in an inconsistent state, but it's worthwhile to be consistent with the "find" command here. Discovered by david.storch. |
| Comments |
| Comment by Githook User [ 04/Sep/19 ] |
|
Author: {'username': 'ldennis', 'email': 'lingzhi.deng@mongodb.com', 'name': 'Lingzhi Deng'}Message:
(cherry picked from commit b885fa6feb7da00dc367e917c53ba16a41b75af4) |
| Comment by Githook User [ 03/Sep/19 ] |
|
Author: {'name': 'Lingzhi Deng', 'username': 'ldennis', 'email': 'lingzhi.deng@mongodb.com'}Message:
(cherry picked from commit b885fa6feb7da00dc367e917c53ba16a41b75af4) |
| Comment by Suganthi Mani [ 21/Aug/19 ] |
|
judah.schvimer, your understanding is correct. I shouldn't have used the word safe in my above comment. Backports would be for optimization purpose. |
| Comment by Shane Harvey [ 29/Mar/19 ] |
|
Perfect, thanks Judah! |
| Comment by Judah Schvimer [ 29/Mar/19 ] |
|
shane.harvey, I wouldn't expect this to have any impact on stepdown since canServeReadsFor is still true as a secondary. This should only impact nodes going into rollback, recovering, or removed states. |
| Comment by Shane Harvey [ 28/Mar/19 ] |
|
Does this have any impact on the "cursors survive primary stepdown" goal in I'm wondering if this change could break drivers that send the cursor's original read preference with getMores. CC: oleg.pudeyev. |
| Comment by Githook User [ 28/Mar/19 ] |
|
Author: {'name': 'Lingzhi Deng', 'username': 'ldennis', 'email': 'lingzhi.deng@mongodb.com'}Message:
|
| Comment by Lingzhi Deng [ 22/Mar/19 ] |
|
Posting my findings here: I agree that we should be consistent with "find" and I have made that change. I also added a test case to make sure we do kill all ongoing connections. The test also exercises checkCanServeReadsFor() in both "find" and "getMore" codepath with and without skipCheckingForNotMasterInCommandDispatch fail point. Previously without the check in "getMore", the test would fail if we enable skipCheckingForNotMasterInCommandDispatch. I am about to open a cr for that once it passes evergreen. |
| Comment by David Storch [ 04/Feb/19 ] |
|
Adding some details per the request of judah.schvimer. The getMore command is completely missing a call to canServeReadsFor(), which is the bug tracked by this ticket. The OP_GET_MORE path, in contrast, appears to be implemented correctly. Note that in this block of code, we only call canServeReadsFor() if the LockPolicy associated with the cursor is not 'kLocksInternally'. For aggregate cursors, getMore/OP_GET_MORE do not take locks themselves, so checkCanServeReadsFor() cannot be called here. However, it looks like the place where we do eventually acquire a lock during agg execution remembers to call checkCanServeReadsFor(): see https://github.com/mongodb/mongo/blob/9606de0f0f3166b9c8fcff033f2476af2937f685/src/mongo/db/pipeline/document_source_cursor.cpp#L93-L94. Therefore, both 'kLocksInternally' and 'kLockExternally' cursors have the necessary canServeReadsFor() check for OP_GET_MORE. 'kLockExternally' cursors with getMore command is flawed. |
| Comment by Judah Schvimer [ 31/Jan/19 ] |
|
This would only lead to returning inconsistent data in a getMore, though a getMore on the oplog should not hit this. Thus a secondary would not have duplicate _id's due to this bug. I think this is a beginning of time bug, though we'd have to create a repro and test it to confirm. |
| Comment by Bruce Lucas (Inactive) [ 31/Jan/19 ] |
|
What versions does this affect? What are the possible user-visible consequences? For example, could this lead to duplicate _id's on the secondary? |