[SERVER-7551] _id Unique key violation during initial sync Created: 05/Nov/12 Updated: 11/Jul/16 Resolved: 11/Nov/12 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | 2.2.1 |
| Fix Version/s: | 2.2.2, 2.3.1 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Eric Milkie | Assignee: | Eric Milkie |
| Resolution: | Done | Votes: | 3 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||||||||||||||
| Issue Links: |
|
||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||
| Participants: | |||||||||||||||||||||
| Comments |
| Comment by Christian Ribe [ 14/Nov/12 ] |
|
Thanks for your help Eric. Chris |
| Comment by auto [ 09/Nov/12 ] |
|
Author: {u'date': u'2012-11-09T15:47:00Z', u'email': u'milkie@10gen.com', u'name': u'Eric Milkie'}Message: |
| Comment by Eric Milkie [ 09/Nov/12 ] |
|
Hi Chris. I've committed a change that I believe will solve your issue. The amount of uncommitted bytes was going too high during initial sync, due to a various array of factors. I've put in some adjustments that will more closely match what we are doing during normal syncing, to ensure the journal doesn't overflow. This fix will be released in version 2.2.2. |
| Comment by auto [ 09/Nov/12 ] |
|
Author: {u'date': u'2012-11-09T15:47:00Z', u'name': u'Eric Milkie', u'email': u'milkie@10gen.com'}Message: |
| Comment by Eric Milkie [ 08/Nov/12 ] |
|
Hi Chris. |
| Comment by Christian Ribe [ 07/Nov/12 ] |
|
Hey Eric, I seem to have solved the issue when I deleted the offending index. The secondary was able to resync correctly after that. Index was not needed anymore so I did not bother the rebuild it. I will not run a repair database since the mongo documents seem to say NOT to do that. http://www.mongodb.org/display/DOCS/Durability+and+Repair Still interested about your analysis of my logs. |
| Comment by Christian Ribe [ 06/Nov/12 ] |
|
Hi Eric, Here are the log files:
Hope it helps you figure out why my setup is crashing. I'm planing on dropping some old data + less critical indexes so primary is smaller. Thank you for your help. |
| Comment by Christian Ribe [ 06/Nov/12 ] |
|
MongoDB logs of primary and crashing secondary on sync. |
| Comment by auto [ 06/Nov/12 ] |
|
Author: {u'date': u'2012-11-06T15:54:08Z', u'email': u'milkie@10gen.com', u'name': u'Eric Milkie'}Message: Converting updates to upserts during replication was added for version 2.2.0. |
| Comment by Eric Milkie [ 06/Nov/12 ] |
|
Hi Chris, |
| Comment by Christian Ribe [ 06/Nov/12 ] |
|
Here is an update. Here is my logs just before it crashes... |
| Comment by Christian Ribe [ 06/Nov/12 ] |
|
MongoDB.log file of : |
| Comment by auto [ 06/Nov/12 ] |
|
Author: {u'date': u'2012-11-06T15:54:08Z', u'email': u'milkie@10gen.com', u'name': u'Eric Milkie'}Message: Converting updates to upserts during replication was added for version 2.2.0. |
| Comment by Christian Ribe [ 05/Nov/12 ] |
|
My secondary did not come back up for me in 2.0.7 |
| Comment by Rotem Hermon [ 05/Nov/12 ] |
|
Thanks, the secondary came up ok in 2.0.7. |
| Comment by Eric Milkie [ 05/Nov/12 ] |
|
Hi Rotem, |
| Comment by Christian Ribe [ 05/Nov/12 ] |
|
Hi, I am in the same situation, I am stuck with no working secondaries that cannot sync up with the primary. I can understand the need to raise an exception on duplicates but I don't see why the hole server should crash. Stop the replication maybe but don't stop the daemon and let it vote no? Will upgrade my primary as suggested and try to limit my downtime... |
| Comment by Rotem Hermon [ 05/Nov/12 ] |
|
1. This log is from a second restart after an earlier crash (same crash, 2. Yes, we're generating our own _ids. |
| Comment by Eric Milkie [ 05/Nov/12 ] |
|
There may be other ways to upgrade without downtime; my first idea was merely a suggestion. Some more questions I have; Due to the value of the _id that conflicted, it appears that you are creating your own _id values; is that correct? |
| Comment by Rotem Hermon [ 05/Nov/12 ] |
|
I have many writes happening all the time, and the whole point is not taking down the entire replica set as this will cause downtime to my site. So you're saying there's no way of upgrading without downtime? That's pretty bad! |
| Comment by Eric Milkie [ 05/Nov/12 ] |
|
Unfortunately, you won't get the fix for unique index key violations until you upgrade both the primary (which generates the oplog) and the secondary (which consumes it). |
| Comment by Rotem Hermon [ 05/Nov/12 ] |
|
Running version 2.0.7. It's a replica set with 2 members (and an arbitrar), all running 2.0.7. Crash happened when trying to upgrade the secondary to 2.2.1. |
| Comment by Eric Milkie [ 05/Nov/12 ] |
|
Hi Rotem. |