[SERVER-3316] Syncing a new replica in a replica set crashes the primary and leaves secondary in strange state Created: 23/Jun/11 Updated: 12/Jul/16 Resolved: 24/Jun/11 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication, Storage |
| Affects Version/s: | 1.8.1 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Critical - P2 |
| Reporter: | Mike K | Assignee: | Scott Hernandez (Inactive) |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
Ubuntu Natty on EC2 |
||
| Attachments: |
|
| Operating System: | ALL |
| Participants: |
| Description |
|
Our set up is as follows: 2 shards consisting of 3 machines each (1 primary, 1 secondary, 1 arbiter). Each shard has about 35GB of data, running on 1.8.1. We lost a secondary today, so are trying to resync a new secondary from scratch. Two things have happened at least twice in this process: 1. The primary segfaults; we had this happen while secondary was mid-sync, but also happen when the secondary was shut down and not communicating with the primary at all (three times). We've tried stopping all the mongods, removing the local files on the primary, starting it up and re-initializing its replica set, and then syncing again, but this led to the same results (we cleared all data off the secondary first, too). I've attached the logs for both seg faults (one was running with verbose=false, the other =true) and a sample of the DR102 errors on the secondary. |
| Comments |
| Comment by Mike K [ 24/Jun/11 ] |
|
Can confirm that 1.8.2 fixed the DR102 issue; the segfaults we saw on primary may be related to some EC2 issues, issue can probably be closed for now and we'll keep an eye out. |
| Comment by Scott Hernandez (Inactive) [ 23/Jun/11 ] |
|
Please upgrade to 1.8.2; it has fixes for many causes to the DR102 error, as well as other fixes. |