[SERVER-2358] mongos autosplitting does not persist or calculate chunksize correctly Created: 13/Jan/11  Updated: 19/Apr/12  Resolved: 14/Jan/11

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 1.7.4
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Jehiah Czebotar Assignee: Unassigned
Resolution: Won't Fix Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Operating System: ALL
Participants:

 Description   

There are several problems with the current strategy for chunk splitting. chunk splitting is (as of 1.7.4) based solely on the amount of new data written to an existing chunk through a single mongos and only since that mongos was started. This is problematic when mongos is restarted or when multiple mongos instances are run because chunks do not get split as expected.

  • _dataWritten (or really currentChunkSize as it's used) should be persisted to the config db and coordinated across multiple mongos instances
  • _dataWritten in the config db should be initialized from getPhysicalSize() when no value is present in the config db. it should never be initialized to a random value that does not represent actual size (https://github.com/mongodb/mongo/blob/r1.7.4/s/chunk.cpp#L58)
  • _dataWritten should not be re-set to 0, but should be set appropriately for each post-split chunk https://github.com/mongodb/mongo/blob/r1.7.4/s/chunk.cpp#L327 (if it is zero'd out and the chunk is not split, it currently requires an additional (maxChunkSize / 2) of data to be written through that specific mongos before it will again be considered for a split.
  • post-split (in Chunk::multiSplit) each chunk should have it's _dataWritten initialized to getPhysicalSize()
  • _dataWritten is not updated on commands that use the multi=True flag


 Comments   
Comment by Alberto Lerner [ 14/Jan/11 ]

_dataWritten's purpose is to trigger the check for whether a split is needed nor the split itself. It need not be accurate, just frequent enough. In some cases you mentioned, it wasn't so in the 1.6 branch, and therefore we made the check mechanism to trigger more often.

getPhysicalSize is an extremely expensive call, incidentally.

The check and the decision to split are moving completely to the mongod side. 1.7, as far is this specific mechanism goes, is just an intermediate step – and it has shown in our tests and beta's to be more reliable than the 1.6 version of the same mechanism. Please feel free to contribute more data about your actual tests, if you have so.

Generated at Thu Feb 08 02:59:43 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.