[SERVER-47086] Kill cursors when transitioning to rollback Created: 24/Mar/20 Updated: 06/Dec/22 Resolved: 30/Mar/20 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Samyukta Lanka | Assignee: | Backlog - Replication Team |
| Resolution: | Won't Fix | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Assigned Teams: |
Replication
|
| Operating System: | ALL |
| Participants: |
| Description |
|
Exhaust cursors have a short period of time in between 'find' and 'getMore's where the cursor is not actively running operations. Rollback currently doesn't kill cursors that have no actively running operation. This means that a cursor for oplog fetching might not be killed when a node's sync source transitions to rollback. The sync source could roll back the syncing node's minValid, meaning it is no longer a valid sync source for the syncing node. Since the syncing node only checks the sync source's rbid for rollback after receiving the first batch, it would not stop fetching from its potentially invalid sync source. We should kill cursors when entering rollback so that the syncing node needs to find a new sync source. |
| Comments |
| Comment by Judah Schvimer [ 25/Mar/20 ] |
|
That is my understanding of the impact. This will also only happen for rollback via refetch. |
| Comment by Tess Avitabile (Inactive) [ 25/Mar/20 ] |
|
Is the impact that the syncing node may stay in RECOVERING forever? Or is there something worse that could happen? |
| Comment by Samyukta Lanka [ 24/Mar/20 ] |
|
If its cursor is killed, the oplog fetcher will follow its retry policy and create a new cursor with the same sync source. This could cause a delay between the sync source rolling back and the syncing node seeking a new sync source. |