[SERVER-16994] Handle WriteConflictException when writing oplog on secondaries Created: 22/Jan/15  Updated: 22/Jul/15  Resolved: 22/Jan/15

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: 3.0.0-rc6

Type: Bug Priority: Major - P3
Reporter: Eric Milkie Assignee: Eric Milkie
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to SERVER-17689 Fatal assertion during replication, a... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Participants:

 Comments   
Comment by William Richards [ 22/Jul/15 ]

Thanks, I just opened it as SERVER-19518. I hadn't noticed the backport for 3.0.5, which probably does address the problem.

Comment by Scott Hernandez (Inactive) [ 22/Jul/15 ]

Will, please open a new issue and specify the server version and other important details about your system/deployment, including logs.

Also, that commit is fairly old and what you describe is no longer the flow: https://github.com/mongodb/mongo/blob/v3.0/src/mongo/db/repl/oplog.cpp#L387

I believe what you described is no longer possible since we back-ported changes into the 3.0 branch – see SERVER-17689, which soon will have a slightly better title/description, and is in the upcoming 3.0.5 minor patch release.

Comment by William Richards [ 21/Jul/15 ]

There didn't seem to be any other relevant info in the server logs. I'll try to get more verbose logs and open a new ticket, but the error is quite difficult to reproduce.

In the meantime, could someone take a quick look at this commit? It seems to me that if ops has more than one item, lastOptime can be updated before the WriteConflictException is thrown. The second time through the while loop, lastOptime will have a value higher than the first element of ops, triggering the fassertFailed.

Thanks,
Will

Comment by Ramon Fernandez Marina [ 21/Jul/15 ]

wrichard, can you please open a new ticket and provide full server logs as well as activity details when you had the crash?

Thanks,
Ramón.

Comment by William Richards [ 21/Jul/15 ]

I think this may not be working. I just had a secondary crash with

2015-07-21T08:40:46.434-0400 I REPL     [rsSync] WriteConflictException while writing oplog, retrying
2015-07-21T08:40:46.475-0400 F REPL     [rsSync] replication oplog stream went back in time. previous timestamp: 55ae3dd4:28 newest timestamp: 55ae3d60:c. 

perhaps the try/catch should be within the ops for loop.

Comment by Githook User [ 22/Jan/15 ]

Author:

{u'username': u'milkie', u'name': u'Eric Milkie', u'email': u'milkie@10gen.com'}

Message: SERVER-16994 handle WriteConflictException when writing oplog on secondaries
Branch: master
https://github.com/mongodb/mongo/commit/816defcefba70ffe815639fa6bb157b69ef034ad

Generated at Thu Feb 08 03:42:57 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.