[SERVER-22822] Prevent mongod step down during moveChunk in balance_repl.js and sharding_rs2.js Created: 23/Feb/16 Updated: 18/Nov/16 Resolved: 26/Feb/16 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | None |
| Fix Version/s: | 3.2.4, 3.3.3 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Dianna Hohensee (Inactive) | Assignee: | Dianna Hohensee (Inactive) |
| Resolution: | Done | Votes: | 0 |
| Labels: | test-only | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||
| Operating System: | ALL | ||||||||
| Backport Completed: | |||||||||
| Sprint: | Sharding 11 (03/11/16) | ||||||||
| Participants: | |||||||||
| Linked BF Score: | 0 | ||||||||
| Description |
|
Check if either balance_repl.js or sharding_rs2.js would be affected by making their mongod secondaries unelectable so as to prevent stepdowns. Or perhaps removing the secondaries if they are unimportant. The mongod stepdown is causing moveChunk to fail, and ultimately the tests to fail, in the sharding_continous_config_stepdown suite. From the test failures, it appears that the heartbeat is received by the primary mongod, but not executed for some time, during which time the primary decides to step down because it hasn't heard from anyone. Or the secondary doesn't hear from anyone and runs for election |
| Comments |
| Comment by Githook User [ 26/Feb/16 ] |
|
Author: {u'username': u'DiannaHohensee', u'name': u'Dianna Hohensee', u'email': u'dianna.hohensee@10gen.com'}Message: (cherry picked from commit 90b12248502592281f430890c838b2834d11d1b3) |
| Comment by Dianna Hohensee (Inactive) [ 26/Feb/16 ] |
|
This should be backported to v3.2 because the sharding_continuous_config_stepdown suite will eventually be enabled in v3.2. |
| Comment by Githook User [ 26/Feb/16 ] |
|
Author: {u'username': u'DiannaHohensee', u'name': u'Dianna Hohensee', u'email': u'dianna.hohensee@10gen.com'}Message: |
| Comment by Spencer Brody (Inactive) [ 24/Feb/16 ] |
|
Ah I see, I was thinking it was the config server that stepped down, not the shard. |
| Comment by Andy Schwerin [ 24/Feb/16 ] |
|
The problem is that a shard steps down during moveChunk, but none of the On Tue, Feb 23, 2016, 6:30 PM Spencer T Brody (JIRA) <jira@mongodb.org> |
| Comment by Spencer Brody (Inactive) [ 23/Feb/16 ] |
|
I'm sure those aren't the only tests that do chunk migrations - are all such tests going to have this problem? Is the real problem a lack of retries when communicating with the config servers during chunk migration? |