[SERVER-1545] make single command for size and median Created: 17/Aug/10  Updated: 05/Jun/17  Resolved: 14/Sep/10

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: 1.7.0

Type: Improvement Priority: Major - P3
Reporter: Eliot Horowitz (Inactive) Assignee: Alberto Lerner
Resolution: Done Votes: 2
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Documented
is documented by DOCS-10339 The dataSize doc page does not mentio... Closed
Related
Participants:

 Description   

Want to have 1 command "shouldSplitAndMedian??" that figures out if we should split, and if so the split point.
While doing, also want to change way we determine if we should split.

Instead of actually counting data, just want to walk index and assume each object is the average object size.
Will make much faster, and also not require all the data to fit in ram.

Also - should make it yield as well, just in case it has to page in index.



 Comments   
Comment by auto [ 14/Sep/10 ]

Author:

{'login': 'erh', 'name': 'Eliot Horowitz', 'email': 'eliot@10gen.com'}

Message: dataSize has an estimate option, chunk uses this SERVER-1545
http://github.com/mongodb/mongo/commit/d5793fa9a8281b349e0fc1ddd09acc9d1a084055

Comment by Alberto Lerner [ 03/Sep/10 ]

Right now, we still need to rely on the dataSize and medianKey commands. The first makes the decision to split; the latter picks where to split. This ticket made dataSize much faster because it now uses an estimated chunk size rather than computing it through scanning the mapped files.

The attempt to use that estimated size and create a single command – this command is in fact splitVector – failed. The datasize varies according with the extents size in a datafile, which grows in increasing strides. Computing split points by assuming each object is datasize/numRecs was very imprecise and led to irregular chunk sizes.

We have ways to make split even faster by keeping a statistical summary of the keys per chunk. That would increase speed further. But our testing results now showed the estimated datasize gave already excellent results.

Comment by auto [ 03/Sep/10 ]

Author:

{'login': 'alerner', 'name': 'Alberto Lerner', 'email': 'alerner@10gen.com'}

Message: SERVER-1545 don't switch shards if current is best (FCoJ)
http://github.com/mongodb/mongo/commit/8b24b1e719fbfe4c7ace9e42f6535fa7fd948462

In an insertion to an empty collection, we could see the auto-splitting code switching shards right in the second chunk. That would leave the first chunk in, say, shard0 and the following ones in shard1.

The reason that happened is that Shard::Pick() assumed the best shard what the first one it got from the config DB. If the current one was not first, and was a tie with it, Pick() would switch.

Comment by auto [ 31/Aug/10 ]

Author:

{'login': 'alerner', 'name': 'Alberto Lerner', 'email': 'alerner@10gen.com'}

Message: SERVER-1545 when moving a chunk, we dont want to risk getting the same shard
http://github.com/mongodb/mongo/commit/ee65b0cc858ffc1ef68ab1a8850f658602b85926

Comment by Alvin Richards (Inactive) [ 25/Aug/10 ]

From Matt Levy

The master log file showed the following:

Tue Aug 24 17:08:58 [conn10] insert choc.events 233ms
Tue Aug 24 17:08:58 [conn23] insert choc.events 167ms
Tue Aug 24 17:08:58 [conn13] insert choc.events 263ms
Tue Aug 24 17:08:58 [conn13] insert choc.events 270ms
Tue Aug 24 17:08:59 [conn23] insert choc.events 321ms
Tue Aug 24 17:08:59 [conn10] insert choc.events 299ms
Tue Aug 24 17:08:59 [conn13] insert choc.events 387ms
Tue Aug 24 17:09:00 [conn13] insert choc.events 284ms
Tue Aug 24 17:09:06 [conn12] Finding median for index:

{ _id: 1.0 }

between { : "5a135bd6-b074-c44f-e52e-6c4e57ffd7e1" } and { : "5c17839c-2da2-67d9-4eda-7fdec6063f4c" } took 6292 ms.
Tue Aug 24 17:09:06 [conn12] query admin.$cmd ntoreturn:1 command: { medianKey: "choc.events", keyPattern:

{ _id: 1.0 }

, min:

{ _id: "5a135bd6-b074-c44f-e52e-6c4e57ffd7e1" }

, max:

{ _id: "5c17839c-2da2-67d9-4eda-7fdec6063f4c" }

} reslen:112 6585ms

Comment by auto [ 24/Aug/10 ]

Author:

{'login': 'erh', 'name': 'Eliot Horowitz', 'email': 'eliot@10gen.com'}

Message: dataSize has an estimate option, chunk uses this SERVER-1545
http://github.com/mongodb/mongo/commit/9a9eb885349ea4644a30adebcb82f995764a9e88

Comment by auto [ 24/Aug/10 ]

Author:

{'login': 'alerner', 'name': 'Alberto Lerner', 'email': 'alerner@10gen.com'}

Message: SERVER-1545 splitVector now takes ranges.
http://github.com/mongodb/mongo/commit/9e951e98b44c7da200d60cc200735e6213d8ebe8

Comment by auto [ 24/Aug/10 ]

Author:

{'login': 'alerner', 'name': 'Alberto Lerner', 'email': 'alerner@10gen.com'}

Message: SERVER-1545 Fix test.
http://github.com/mongodb/mongo/commit/cd9d7218227cebb8220c09aa6a78ac4cb0d2fd94

Comment by auto [ 24/Aug/10 ]

Author:

{'login': 'alerner', 'name': 'Alberto Lerner', 'email': 'alerner@10gen.com'}

Message: SERVER-1545 Leave new chunks half-full instead of at 90%.
http://github.com/mongodb/mongo/commit/2af7fa477c6a0764e0af9def5ed7a2fdd0cdfe47

Comment by auto [ 23/Aug/10 ]

Author:

{'login': 'alerner', 'name': 'Alberto Lerner', 'email': 'alerner@10gen.com'}

Message: SERVER-1545 Add a fast path to splitVector command.
http://github.com/mongodb/mongo/commit/4bcf64d3d132acaf61580743a5d121bf4d90a002

Generated at Thu Feb 08 02:57:19 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.