[SERVER-39310] Check canServeReadsFor in getMores Created: 31/Jan/19  Updated: 29/Oct/23  Resolved: 30/Mar/19

Status: Closed
Project: Core Server
Component/s: Querying, Replication
Affects Version/s: None
Fix Version/s: 3.6.15, 4.1.10, 4.0.13

Type: Bug Priority: Major - P3
Reporter: Judah Schvimer Assignee: Lingzhi Deng
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v4.0, v3.6, v3.4
Sprint: Repl 2019-03-25, Repl 2019-04-08
Participants:
Linked BF Score: 6

 Description   

The "getMore" command path appears to never call canServeReadsFor, meaning that we could end up serving reads during a rollback. Data in this time could be inconsistent. For rollback via refetch, we refetch documents from the future and then play updates on them from the past which can lead to documents that never existed being returned. Rollback to a Timestamp is likely safe since it never is in an inconsistent state, but it's worthwhile to be consistent with the "find" command here.

Discovered by david.storch.



 Comments   
Comment by Githook User [ 04/Sep/19 ]

Author:

{'username': 'ldennis', 'email': 'lingzhi.deng@mongodb.com', 'name': 'Lingzhi Deng'}

Message: SERVER-39310: Call checkCanServeReadsFor() in 'getMore'

  • added call to checkCanServeReadsFor() in getmore_cmd.cpp after getting readlocks
  • introduced two fail points: 1. pause 'getMore' before readlocks 2. pause rollback after state transition
  • added testcase read_operations_during_rollback.js
  • added call to checkCanServeReadsFor() in find_cmd.cpp after getting readlocks (cherry picked from commit e8fe32029aded4d0e909f531196edff43c96cfff)
  • added assert.includes (cherry picked from commit dd9be1adf2425c7ddd746ff6da75d564474ebed3)

(cherry picked from commit b885fa6feb7da00dc367e917c53ba16a41b75af4)
Branch: v3.6
https://github.com/mongodb/mongo/commit/3eb6bc9a15f95473bf666a58be81da1e89cc0837

Comment by Githook User [ 03/Sep/19 ]

Author:

{'name': 'Lingzhi Deng', 'username': 'ldennis', 'email': 'lingzhi.deng@mongodb.com'}

Message: SERVER-39310: Call checkCanServeReadsFor() in 'getMore'

  • added call to checkCanServeReadsFor() in getmore_cmd.cpp after getting readlocks
  • introduced two fail points: 1. pause 'getMore' before readlocks 2. pause rollback after state transition
  • added testcase read_operations_during_rollback.js
  • added assert.includes (cherry picked from commit dd9be1adf2425c7ddd746ff6da75d564474ebed3)

(cherry picked from commit b885fa6feb7da00dc367e917c53ba16a41b75af4)
Branch: v4.0
https://github.com/mongodb/mongo/commit/88df2558b0c3b39b86df2ab97814d701d701d704

Comment by Suganthi Mani [ 21/Aug/19 ]

judah.schvimer, your understanding is correct. I shouldn't have used the word safe in my above comment. Backports would be for optimization purpose.

Comment by Shane Harvey [ 29/Mar/19 ]

Perfect, thanks Judah!

Comment by Judah Schvimer [ 29/Mar/19 ]

shane.harvey, I wouldn't expect this to have any impact on stepdown since canServeReadsFor is still true as a secondary. This should only impact nodes going into rollback, recovering, or removed states.

Comment by Shane Harvey [ 28/Mar/19 ]

Does this have any impact on the "cursors survive primary stepdown" goal in DRIVERS-552?

I'm wondering if this change could break drivers that send the cursor's original read preference with getMores. CC: oleg.pudeyev.

Comment by Githook User [ 28/Mar/19 ]

Author:

{'name': 'Lingzhi Deng', 'username': 'ldennis', 'email': 'lingzhi.deng@mongodb.com'}

Message: SERVER-39310: Call checkCanServeReadsFor() in 'getMore'

Comment by Lingzhi Deng [ 22/Mar/19 ]

Posting my findings here:
When a node transitions to ROLLBACK, it also kills all ongoing connections under the Transition Lock Guard. And if a client reconnects and issues a "getMore" command, execCommandDatabase() will catch it at here. So I think we are safe.

I agree that we should be consistent with "find" and I have made that change. I also added a test case to make sure we do kill all ongoing connections. The test also exercises checkCanServeReadsFor() in both "find" and "getMore" codepath with and without skipCheckingForNotMasterInCommandDispatch fail point. Previously without the check in "getMore", the test would fail if we enable skipCheckingForNotMasterInCommandDispatch.

I am about to open a cr for that once it passes evergreen.

Comment by David Storch [ 04/Feb/19 ]

Adding some details per the request of judah.schvimer.

The getMore command is completely missing a call to canServeReadsFor(), which is the bug tracked by this ticket. The OP_GET_MORE path, in contrast, appears to be implemented correctly. Note that in this block of code, we only call canServeReadsFor() if the LockPolicy associated with the cursor is not 'kLocksInternally'. For aggregate cursors, getMore/OP_GET_MORE do not take locks themselves, so checkCanServeReadsFor() cannot be called here. However, it looks like the place where we do eventually acquire a lock during agg execution remembers to call checkCanServeReadsFor(): see https://github.com/mongodb/mongo/blob/9606de0f0f3166b9c8fcff033f2476af2937f685/src/mongo/db/pipeline/document_source_cursor.cpp#L93-L94. Therefore, both 'kLocksInternally' and 'kLockExternally' cursors have the necessary canServeReadsFor() check for OP_GET_MORE. 'kLockExternally' cursors with getMore command is flawed.

Comment by Judah Schvimer [ 31/Jan/19 ]

This would only lead to returning inconsistent data in a getMore, though a getMore on the oplog should not hit this. Thus a secondary would not have duplicate _id's due to this bug. I think this is a beginning of time bug, though we'd have to create a repro and test it to confirm.

Comment by Bruce Lucas (Inactive) [ 31/Jan/19 ]

What versions does this affect? What are the possible user-visible consequences? For example, could this lead to duplicate _id's on the secondary?

Generated at Thu Feb 08 04:51:39 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.