[SERVER-30940] rollback can truncate the oplog behind the commit point Created: 04/Sep/17 Updated: 30/Oct/23 Resolved: 10/Oct/17 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | 3.5.12 |
| Fix Version/s: | 3.6.0-rc0 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | James O'Leary | Assignee: | Judah Schvimer |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | sysperf-36 | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Backwards Compatibility: | Fully Compatible |
| Operating System: | ALL |
| Participants: |
| Description |
|
During one of the sys perf aggregation tests, the primary of the first shard failed with the following backtrace:
|
| Comments |
| Comment by Judah Schvimer [ 10/Oct/17 ] | |||||||||||||||||||||
|
The above commit ensured that replication would invariant before rolling back behind the commit point. I'm closing as fixed and will reopen it if it reproduces. | |||||||||||||||||||||
| Comment by Kevin Duong [ 06/Oct/17 ] | |||||||||||||||||||||
|
Changing this from debugging with submitter to 3.6 required. | |||||||||||||||||||||
| Comment by Githook User [ 19/Sep/17 ] | |||||||||||||||||||||
|
Author: {'username': 'judahschvimer', 'name': 'Judah Schvimer', 'email': 'judah@mongodb.com'}Message: | |||||||||||||||||||||
| Comment by Judah Schvimer [ 13/Sep/17 ] | |||||||||||||||||||||
|
After talking to milkie, this appears to imply that we are truncating behind the stable timestamp, which is behind the replication committed optime. We pass cappedTruncateAfter false for the "inclusive" flag, so we are not removing the common point itself. This first line is strange. The two oplog entries we compare to go into rollback have the same timestamp and term, but different hashes, which should be impossible.
| |||||||||||||||||||||
| Comment by Eric Milkie [ 12/Sep/17 ] | |||||||||||||||||||||
|
Agreed; I removed the link to | |||||||||||||||||||||
| Comment by Judah Schvimer [ 12/Sep/17 ] | |||||||||||||||||||||
|
This doesn't look related to | |||||||||||||||||||||
| Comment by Eric Milkie [ 04/Sep/17 ] | |||||||||||||||||||||
|
This isn't a segmentation fault (doesn't appear in the log), so I removed the links to other tickets and changed the title to reflect this. | |||||||||||||||||||||
| Comment by Eric Milkie [ 04/Sep/17 ] | |||||||||||||||||||||
|
The error that triggered the crash:
I think this is a manifestation of a problem that | |||||||||||||||||||||
| Comment by Eric Milkie [ 04/Sep/17 ] | |||||||||||||||||||||
|
jim.oleary do you have a link to the test? I'd like to see the full log. |