[SERVER-35616] Oplog query on initial syncing node can cause segmentation fault Created: 15/Jun/18 Updated: 29/Oct/23 Resolved: 24/Aug/18 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | None |
| Fix Version/s: | 4.0.3, 4.1.3 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Judah Schvimer | Assignee: | Tess Avitabile (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||||||||||
| Issue Links: |
|
||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||
| Operating System: | ALL | ||||||||||||||||
| Backport Requested: |
v4.0, v3.6
|
||||||||||||||||
| Sprint: | Repl 2018-07-16, Repl 2018-08-27 | ||||||||||||||||
| Participants: | |||||||||||||||||
| Linked BF Score: | 60 | ||||||||||||||||
| Description |
|
If an oplog query with a read concern afterClusterTime is run on a node while it is in initial sync but doesn't have an oplog, then it will seg-fault. This is similar to |
| Comments |
| Comment by Githook User [ 28/Aug/18 ] |
|
Author: {'name': 'Tess Avitabile', 'email': 'tess.avitabile@mongodb.com', 'username': 'tessavitabile'}Message: (cherry picked from commit f7c2600036ce876bb389f3eb3adc8eada6932d8b) |
| Comment by Githook User [ 24/Aug/18 ] |
|
Author: {'name': 'Tess Avitabile', 'email': 'tess.avitabile@mongodb.com', 'username': 'tessavitabile'}Message: |
| Comment by Tess Avitabile (Inactive) [ 22/Aug/18 ] |
|
The script does not cause the seg-fault on 3.6. On 3.6, we do not start advancing the cluster time until the FCV is 3.6. Until the cluster time becomes non-null, all non-null afterClusterTime queries will uassert (and null afterClusterTime queries always uassert). During initial sync, the FCV is not set until after the oplog is created, so by the time afterClusterTime queries can succeed, the oplog exists, so there is no seg-fault attempting to access the oplog. I made an attempt to get the cluster time to be non-null when in initial sync by using the resync command. However, when we are initial sync due to a resync command, the oplog is not dropped, so we do not hit the seg-fault. Since I have been unable to reproduce this on 3.6, I am going to decline to backport the fix to 3.6. |
| Comment by Tess Avitabile (Inactive) [ 06/Jul/18 ] |
|
No, I don't think it makes sense to put this in the Initial Sync Semantics epic. That epic is to ensure that the presence of an initial syncing node cannot cause majority acknowledged writes to roll back and to ensure that once a node exits initial sync, it can restart or roll back without requiring a full resync. |
| Comment by Judah Schvimer [ 15/Jun/18 ] |
|
Reproduced here on top of 3bec524a983f2a21bfd40b1d39c937189e12db07. bf-9529.diff |