[SERVER-16994] Handle WriteConflictException when writing oplog on secondaries Created: 22/Jan/15 Updated: 22/Jul/15 Resolved: 22/Jan/15 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | None |
| Fix Version/s: | 3.0.0-rc6 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Eric Milkie | Assignee: | Eric Milkie |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||
| Operating System: | ALL | ||||||||
| Participants: | |||||||||
| Comments |
| Comment by William Richards [ 22/Jul/15 ] | ||
|
Thanks, I just opened it as | ||
| Comment by Scott Hernandez (Inactive) [ 22/Jul/15 ] | ||
|
Will, please open a new issue and specify the server version and other important details about your system/deployment, including logs. Also, that commit is fairly old and what you describe is no longer the flow: https://github.com/mongodb/mongo/blob/v3.0/src/mongo/db/repl/oplog.cpp#L387 I believe what you described is no longer possible since we back-ported changes into the 3.0 branch – see | ||
| Comment by William Richards [ 21/Jul/15 ] | ||
|
There didn't seem to be any other relevant info in the server logs. I'll try to get more verbose logs and open a new ticket, but the error is quite difficult to reproduce. In the meantime, could someone take a quick look at this commit? It seems to me that if ops has more than one item, lastOptime can be updated before the WriteConflictException is thrown. The second time through the while loop, lastOptime will have a value higher than the first element of ops, triggering the fassertFailed. Thanks, | ||
| Comment by Ramon Fernandez Marina [ 21/Jul/15 ] | ||
|
wrichard, can you please open a new ticket and provide full server logs as well as activity details when you had the crash? Thanks, | ||
| Comment by William Richards [ 21/Jul/15 ] | ||
|
I think this may not be working. I just had a secondary crash with
perhaps the try/catch should be within the ops for loop. | ||
| Comment by Githook User [ 22/Jan/15 ] | ||
|
Author: {u'username': u'milkie', u'name': u'Eric Milkie', u'email': u'milkie@10gen.com'}Message: |