[SERVER-19322] Segmentation fault during replication in a sharded replicated mongodb environment Created: 07/Jul/15 Updated: 13/Jul/15 Resolved: 13/Jul/15 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | 3.0.3 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Critical - P2 |
| Reporter: | Praveen Akinapally | Assignee: | Sam Kleinman (Inactive) |
| Resolution: | Cannot Reproduce | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Operating System: | ALL | ||||||||
| Steps To Reproduce: | Deployment Environment: Have a Primary sharded mongodb server mongodb1 (priority set to 2) and has data sharded. Add replicate set members mongodb0 (priority set to 3) and mongodb2 (priority set to 1). Data starts syncing to replica set members mongodb0 and mongodb2. As soon as mongodb0 transitions from STARTUP2 to PRIMARY mongod process on mongodb2 fails with the above mentioned stack trace. mongod instance running on mongodb0 transitions to primary : mongod instance running on mongodb2 fails with segmentation fault : This process used to run fine until WiredTiger Upgrade. |
||||||||
| Participants: | |||||||||
| Description |
|
We have 3 mongodb servers mongodb0 (priority=3), mongodb1 (priority=2) and mongodb2 (priority=1). mongodb1 has full data and we start data sync to empty mongodb0 and mongodb2 machines by adding them as replica set members. As soon as mongodb0 sync completes successfully and transitions to primary mongodb2 sync stops and fails with Segmentation fault Error. This process used to run fine until WiredTiger Upgrade. This process used to run fine until WiredTiger Upgrade. Error Stack Trace:
|
| Comments |
| Comment by Praveen Akinapally [ 10/Jul/15 ] | ||||||||||||
|
Thanks Alexander and Sam. I upgraded my Mongo Cluster to 3.0.4 and it works well now. | ||||||||||||
| Comment by Alexander Gorrod [ 08/Jul/15 ] | ||||||||||||
|
I agree with samk that this should be fixed in 3.0.4, but I think the fix was a different issue The particular problem was that WiredTiger had a bug where a page could be evicted from cache at the same time as the collection was being removed. Which lead to a race condition where the page could be freed by two different threads at the same time. | ||||||||||||
| Comment by Sam Kleinman (Inactive) [ 08/Jul/15 ] | ||||||||||||
|
Thanks for the report, and sorry that you've hit this issue. This looks related to Can you upgrade to 3.0.4 and see if this resolves your issue? Regards, | ||||||||||||
| Comment by Sam Kleinman (Inactive) [ 08/Jul/15 ] | ||||||||||||
|
full addr2line:
Potentially related to Will recommend an upgrade to 3.0.4 in the mean time. |