[SERVER-22822] Prevent mongod step down during moveChunk in balance_repl.js and sharding_rs2.js Created: 23/Feb/16  Updated: 18/Nov/16  Resolved: 26/Feb/16

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: 3.2.4, 3.3.3

Type: Bug Priority: Major - P3
Reporter: Dianna Hohensee (Inactive) Assignee: Dianna Hohensee (Inactive)
Resolution: Done Votes: 0
Labels: test-only
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to SERVER-22935 Add JS/client function to retry moveC... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Completed:
Sprint: Sharding 11 (03/11/16)
Participants:
Linked BF Score: 0

 Description   

Check if either balance_repl.js or sharding_rs2.js would be affected by making their mongod secondaries unelectable so as to prevent stepdowns. Or perhaps removing the secondaries if they are unimportant.

The mongod stepdown is causing moveChunk to fail, and ultimately the tests to fail, in the sharding_continous_config_stepdown suite.

From the test failures, it appears that the heartbeat is received by the primary mongod, but not executed for some time, during which time the primary decides to step down because it hasn't heard from anyone. Or the secondary doesn't hear from anyone and runs for election



 Comments   
Comment by Githook User [ 26/Feb/16 ]

Author:

{u'username': u'DiannaHohensee', u'name': u'Dianna Hohensee', u'email': u'dianna.hohensee@10gen.com'}

Message: SERVER-22822 prevent mongod stepdown in balance_repl.js and sharding_rs2.js

(cherry picked from commit 90b12248502592281f430890c838b2834d11d1b3)
Branch: v3.2
https://github.com/mongodb/mongo/commit/4f3f65608ef6223c7b97439f7fcec0e4e864cce0

Comment by Dianna Hohensee (Inactive) [ 26/Feb/16 ]

This should be backported to v3.2 because the sharding_continuous_config_stepdown suite will eventually be enabled in v3.2.

Comment by Githook User [ 26/Feb/16 ]

Author:

{u'username': u'DiannaHohensee', u'name': u'Dianna Hohensee', u'email': u'dianna.hohensee@10gen.com'}

Message: SERVER-22822 prevent mongod stepdown in balance_repl.js and sharding_rs2.js
Branch: master
https://github.com/mongodb/mongo/commit/90b12248502592281f430890c838b2834d11d1b3

Comment by Spencer Brody (Inactive) [ 24/Feb/16 ]

Ah I see, I was thinking it was the config server that stepped down, not the shard.

Comment by Andy Schwerin [ 24/Feb/16 ]

The problem is that a shard steps down during moveChunk, but none of the
tests are robust to that. We could investigate making them retry moveChunk
on shard failure, I suppose. We'd have to hit up every test, and I doubt
they're trying to test stepdown of a shard during migration.

On Tue, Feb 23, 2016, 6:30 PM Spencer T Brody (JIRA) <jira@mongodb.org>

Comment by Spencer Brody (Inactive) [ 23/Feb/16 ]

I'm sure those aren't the only tests that do chunk migrations - are all such tests going to have this problem? Is the real problem a lack of retries when communicating with the config servers during chunk migration?

Generated at Thu Feb 08 04:01:32 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.