[SERVER-27277] [rsBackgroundSync] Fatal assertion 18750 UnrecoverableRollbackError on numerous 3.2.10 replica sets Created: 05/Dec/16 Updated: 17/Jul/17 Resolved: 06/Dec/16 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | 3.2.10 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Critical - P2 |
| Reporter: | Avraham Kalvo | Assignee: | Unassigned |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||||||
| Issue Links: |
|
||||||||||||
| Operating System: | ALL | ||||||||||||
| Participants: | |||||||||||||
| Case: | (copied to CRM) | ||||||||||||
| Description |
|
We had a weekend full of networks issues which caused data replication all over our topology to become stale (slaves were lagging after masters etc.) All lags and discrepancies were shortly settled once the network issues were addressed (in databases such as MySQL, Cassandra) A couple of our mongo replica sets in production, however, failed to recover from the network outages, as can be seen in the log enclosed hereby, specifically for one of them, there appeared to be several attempts at re-election for primary followed by attempts to interconnect with other members which failed due to network unavailability and ignited a background sync rollback that finally has failed after several attempts
As mentioned before, this has occurred on a couple of mission critical replica sets which failed to recover from it. We've upgraded from 3.2.9 to 3.2.10 after some harsh performance related bugs, We really need those replica sets stable working on mongo, same as other resilient db solutions we have and that have recovered from this completely. Thanks for your prompt advise! Avi Kalvo |
| Comments |
| Comment by Kelsey Schubert [ 06/Dec/16 ] |
|
Hi avrahamk, While there have been a number a performance improvements in 3.2.11 that may impact the behavior you are observing, we cannot point to a particular ticket that would resolve the issue described in I've examined the logs in more detail and are confident that you are hitting Thank you, |
| Comment by Avraham Kalvo [ 06/Dec/16 ] |
|
Thanks Ramon, Can you also confirm 3.2.11 will resolve performance issues as reported by our end on another ticket with you Thanks, |
| Comment by Ramon Fernandez Marina [ 05/Dec/16 ] |
|
avrahamk, the log line you put in the description seems to indicate that you're running into Thanks, |