[SERVER-20748] cluster find command needs to reload DBConfig on epoch mismatches Created: 02/Oct/15  Updated: 25/Jan/17  Resolved: 16/Oct/15

Status: Closed
Project: Core Server
Component/s: Querying, Sharding
Affects Version/s: None
Fix Version/s: 3.2.0-rc1

Type: Bug Priority: Major - P3
Reporter: Spencer Brody (Inactive) Assignee: Spencer Brody (Inactive)
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Backwards Compatibility: Fully Compatible
Operating System: ALL
Sprint: Sharding A (10/09/15), Sharding B (10/30/15)
Participants:

 Comments   
Comment by Githook User [ 16/Oct/15 ]

Author:

{u'username': u'stbrody', u'name': u'Spencer T Brody', u'email': u'spencer@mongodb.com'}

Message: SERVER-20748 Handle epoch mismatch explicitly in cluster find command
Branch: master
https://github.com/mongodb/mongo/commit/7cf6b9bf5a47f1446be71105a4186be924e20a85

Comment by Andy Schwerin [ 06/Oct/15 ]

I believe Spencer's assessment that the code is correct by coincidence. Ideally, we should do a modicum of future-proofing before 3.2.0. I think that means having the cluster find command do epoch mismatch detection a la the old find path and setting the fullReload flag in the chunk reload operation when an epoch mismatch is detected, rather than counting on the coincidence described in Spencer's comment.

Comment by Spencer Brody (Inactive) [ 05/Oct/15 ]

I believe that the cluster find command is actually already doing the right thing.

In the legacy code, refreshing the ChunkManager happens in ParallelSortClusteredCursor::_handleStaleNS. Notice that it always passes "true" for the third argument, the "shouldReload" argument. The second argument, "forceReload", is only set to true if the stale config exception received had an epoch mismatch (determined here).
In the new code, ChunkManager refresh happens in ClusterFind::runQuery, which also always passes "true" for "shouldReload".

So now the question is, what happens if there's an epoch mismatch and we don't pass "true" for "forceReload" to DBConfig::getChunkManagerIfExists. I think the combination of this and this cause this to work out fine in the end. Basically, when we go to refresh the chunks, if the chunks we load have a different epoch than what we remember seeing for this collection, the ChunkDiffer stops. If loadExistingRanges loads no chunks, then getChunkManager will correctly load the DBConfig.

Generated at Thu Feb 08 03:55:09 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.