[SERVER-1545] make single command for size and median Created: 17/Aug/10 Updated: 05/Jun/17 Resolved: 14/Sep/10 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | None |
| Fix Version/s: | 1.7.0 |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Eliot Horowitz (Inactive) | Assignee: | Alberto Lerner |
| Resolution: | Done | Votes: | 2 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||
| Participants: | |||||||||||||
| Description |
|
Want to have 1 command "shouldSplitAndMedian??" that figures out if we should split, and if so the split point. Instead of actually counting data, just want to walk index and assume each object is the average object size. Also - should make it yield as well, just in case it has to page in index. |
| Comments |
| Comment by auto [ 14/Sep/10 ] |
|
Author: {'login': 'erh', 'name': 'Eliot Horowitz', 'email': 'eliot@10gen.com'}Message: dataSize has an estimate option, chunk uses this |
| Comment by Alberto Lerner [ 03/Sep/10 ] |
|
Right now, we still need to rely on the dataSize and medianKey commands. The first makes the decision to split; the latter picks where to split. This ticket made dataSize much faster because it now uses an estimated chunk size rather than computing it through scanning the mapped files. The attempt to use that estimated size and create a single command – this command is in fact splitVector – failed. The datasize varies according with the extents size in a datafile, which grows in increasing strides. Computing split points by assuming each object is datasize/numRecs was very imprecise and led to irregular chunk sizes. We have ways to make split even faster by keeping a statistical summary of the keys per chunk. That would increase speed further. But our testing results now showed the estimated datasize gave already excellent results. |
| Comment by auto [ 03/Sep/10 ] |
|
Author: {'login': 'alerner', 'name': 'Alberto Lerner', 'email': 'alerner@10gen.com'}Message: In an insertion to an empty collection, we could see the auto-splitting code switching shards right in the second chunk. That would leave the first chunk in, say, shard0 and the following ones in shard1. The reason that happened is that Shard::Pick() assumed the best shard what the first one it got from the config DB. If the current one was not first, and was a tie with it, Pick() would switch. |
| Comment by auto [ 31/Aug/10 ] |
|
Author: {'login': 'alerner', 'name': 'Alberto Lerner', 'email': 'alerner@10gen.com'}Message: |
| Comment by Alvin Richards (Inactive) [ 25/Aug/10 ] |
|
From Matt Levy The master log file showed the following: Tue Aug 24 17:08:58 [conn10] insert choc.events 233ms between { : "5a135bd6-b074-c44f-e52e-6c4e57ffd7e1" } and { : "5c17839c-2da2-67d9-4eda-7fdec6063f4c" } took 6292 ms. , min: { _id: "5a135bd6-b074-c44f-e52e-6c4e57ffd7e1" }, max: { _id: "5c17839c-2da2-67d9-4eda-7fdec6063f4c" }} reslen:112 6585ms |
| Comment by auto [ 24/Aug/10 ] |
|
Author: {'login': 'erh', 'name': 'Eliot Horowitz', 'email': 'eliot@10gen.com'}Message: dataSize has an estimate option, chunk uses this |
| Comment by auto [ 24/Aug/10 ] |
|
Author: {'login': 'alerner', 'name': 'Alberto Lerner', 'email': 'alerner@10gen.com'}Message: |
| Comment by auto [ 24/Aug/10 ] |
|
Author: {'login': 'alerner', 'name': 'Alberto Lerner', 'email': 'alerner@10gen.com'}Message: |
| Comment by auto [ 24/Aug/10 ] |
|
Author: {'login': 'alerner', 'name': 'Alberto Lerner', 'email': 'alerner@10gen.com'}Message: |
| Comment by auto [ 23/Aug/10 ] |
|
Author: {'login': 'alerner', 'name': 'Alberto Lerner', 'email': 'alerner@10gen.com'}Message: |