[SERVER-26791] move/split/mergeChunk commands do a full metadata refresh on the shard Created: 26/Oct/16 Updated: 01/Nov/17 Resolved: 23/Jan/17 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | 3.2.10, 3.4.0-rc1 |
| Fix Version/s: | 3.4.2, 3.5.2 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Kaloian Manassiev | Assignee: | Randolph Tan |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||||||||||||||
| Issue Links: |
|
||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||
| Backport Requested: |
v3.4
|
||||||||||||||||||||
| Sprint: | Sharding 2016-12-12, Sharding 2017-01-02, Sharding 2017-02-13 | ||||||||||||||||||||
| Participants: | |||||||||||||||||||||
| Case: | (copied to CRM) | ||||||||||||||||||||
| Description |
|
The move/split/mergeChunk commands on the shards perform a full metadata refresh on the shard prior to commencing the operation instead of incremental. For clusters with large number of chunks this can cause a significant delay. |
| Comments |
| Comment by Githook User [ 20/Jan/17 ] |
|
Author: {u'username': u'renctan', u'name': u'Randolph Tan', u'email': u'randolph@10gen.com'}Message: (cherry picked from commit 6add2c82bacecd5f54613ebf4be1553f3b046cbc) |
| Comment by Githook User [ 19/Jan/17 ] |
|
Author: {u'username': u'renctan', u'name': u'Randolph Tan', u'email': u'randolph@10gen.com'}Message: |
| Comment by Githook User [ 19/Jan/17 ] |
|
Author: {u'username': u'renctan', u'name': u'Randolph Tan', u'email': u'randolph@10gen.com'}Message: Revert "Revert " This reverts commit d3e67186d1f9c633e8e69ebb7bf2418d3850688a. |
| Comment by Githook User [ 19/Jan/17 ] |
|
Author: {u'username': u'visualzhou', u'name': u'Siyuan Zhou', u'email': u'siyuan.zhou@mongodb.com'}Message: Revert " This reverts commit 6add2c82bacecd5f54613ebf4be1553f3b046cbc. |
| Comment by Githook User [ 19/Jan/17 ] |
|
Author: {u'username': u'renctan', u'name': u'Randolph Tan', u'email': u'randolph@10gen.com'}Message: |
| Comment by Kaloian Manassiev [ 06/Dec/16 ] |
|
Yes I agree. I was just wondering whether it is safe to make all refresh to be incremental with a back-off to full reload on epoch mismatch or there are some cases where we must force a complete reload. I think this is safe, as long as we are always including the version epoch in the incremental refresh query. This, combined with the check we do on the config.collections entry will ensure that the refresh will never load chunks belonging to different incarnations of the same collection from different epochs. |
| Comment by Randolph Tan [ 06/Dec/16 ] |
|
Hm... I think you might be right. I am little bit concerned that if we don't have the full reload on epoch mismatch, there might be a scenario where there is no other way for a shard to recover. For example, if _recvChunkStart encounters this, then the shard will never accept chunks for this collection until you restart it. |
| Comment by Kaloian Manassiev [ 06/Dec/16 ] |
|
Do you mean that we should include the retry logic used by onStaleShardVersion in move/split/merge or as part of every refresh and have it back-off to full refresh in this case? Because for move/split/merge it is fine to fail the operation since on epoch mismatch it does not mean anything after collection drop and recreate. |
| Comment by Randolph Tan [ 05/Dec/16 ] |
|
It looks like we should be able to switch to incremental refresh, because the refresh code already checks for epoch mismatch so it should be safe to do so. We might need to add some extra code to handle the case of epoch mismatch. |