[SERVER-22390] RangeDeleter crashes primary node due to lagging secondary Created: 01/Feb/16 Updated: 18/Nov/16 Resolved: 09/Feb/16 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | 3.2.1 |
| Fix Version/s: | 3.2.3, 3.3.2 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Yoni Douek | Assignee: | Misha Tyulenev |
| Resolution: | Done | Votes: | 0 |
| Labels: | code-only | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||||||||||||||||||
| Issue Links: |
|
||||||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||||||
| Backport Completed: | |||||||||||||||||||||||||
| Sprint: | Sharding 10 (02/19/16) | ||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||
| Linked BF Score: | 0 | ||||||||||||||||||||||||
| Description |
|
Our setup: primary, secondary, arbiter. When adding a new replica member - which is >1hr behind, range deleter asserts the primary and crashes it. This behavior is too conservative and should be treated otherwise.
|
| Comments |
| Comment by Dennis Zhuang [ 15/Aug/16 ] | |
|
@Yoni Douek Hi , can we apply this patch to 3.0.12 by ourself? We upgraded our mongodb to 3.0 last week, but we can't upgrade it to 3.2 or 3.3 immediately. We had crashed today with this issue. | |
| Comment by Githook User [ 09/Feb/16 ] | |
|
Author: {u'username': u'mikety', u'name': u'Misha Tyulenev', u'email': u'misha@mongodb.com'}Message: (cherry picked from commit 805b8c51a6103e0482df81b5ef9b3efd448b0806) | |
| Comment by Githook User [ 09/Feb/16 ] | |
|
Author: {u'username': u'mikety', u'name': u'Misha Tyulenev', u'email': u'misha@mongodb.com'}Message: | |
| Comment by Githook User [ 09/Feb/16 ] | |
|
Author: {u'username': u'dannenberg', u'name': u'matt dannenberg', u'email': u'matt.dannenberg@10gen.com'}Message: | |
| Comment by Randolph Tan [ 05/Feb/16 ] | |
|
Another example (remove2.log), with a different status:
Should probably not fassert with this error. | |
| Comment by Randolph Tan [ 05/Feb/16 ] | |
|
Attached log from a build failure that exhibited the same behavior | |
| Comment by Yoni Douek [ 04/Feb/16 ] | |
|
All you need is to add a replica which is far behind, and therefore can't participate in replicated deletes. In our case - Primary doesn't have a majority when range deleting, and will crash after 1 hour. | |
| Comment by Misha Tyulenev [ 03/Feb/16 ] | |
|
yonido, we need more information to troubleshoot this ticket:
Thanks! | |
| Comment by Ramon Fernandez Marina [ 02/Feb/16 ] | |
|
Thanks yonido, I've updated the ticket accordingly. One of our engineers is already looking into this. | |
| Comment by Yoni Douek [ 02/Feb/16 ] | |
|
3.2.1 | |
| Comment by Ramon Fernandez Marina [ 02/Feb/16 ] | |
|
yonido, what version(s) of MongoDB are in use in this replica set? |