[SERVER-1779] "finding median" too slow. Created: 12/Sep/10 Updated: 16/Nov/21 Resolved: 13/Sep/10 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Performance |
| Affects Version/s: | 1.6.2, 1.7.0 |
| Fix Version/s: | 1.7.1 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Sergei Tulentsev | Assignee: | Eliot Horowitz (Inactive) |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
ubuntu 9.10 the db is on a RAID0 of 2 disks |
||
| Operating System: | Linux |
| Participants: |
| Description |
|
Here's some excerpt from the logs between { : 24430203 } and { : 25662603 } took 3161 ms. between { : 26814034 } and { : 27937206 } took 14448 ms. between { : 22297203 } and { : 23324407 } took 2012 ms. between { : 60706232 } and { : 62958603 } took 23454 ms. between { : 25662603 } and { : 26814034 } took 3050 ms. between { : 23324407 } and { : 24430203 } took 2804 ms. between { : 37270223 } and { : 38933407 } took 24029 ms. between { : 40766434 } and { : 42625800 } took 13718 ms. between { : 58696810 } and { : 60706232 } took 11311 ms. between { : 30362400 } and { : 31572033 } took 9520 ms. between { : 72665435 } and { : 96292200 } took 7396 ms. between { : 68727613 } and { : 72665435 } took 7880 ms. between { : 38933407 } and { : 40766434 } took 10045 ms. between { : 35630403 } and { : 37270223 } took 7486 ms. between { : 65599205 } and { : 68727613 } took 95402 ms. between { : 62958603 } and { : 65599205 } took 72608 ms. between { : 56850022 } and { : 58696810 } took 58778 ms. between { : 55276200 } and { : 56850022 } took 50957 ms. between { : 45860418 } and { : 47944851 } took 47859 ms. between { : 51170454 } and { : 52417803 } took 38282 ms. between { : 53762457 } and { : 55276200 } took 27610 ms. between { : 42625800 } and { : 44283041 } took 34589 ms. between { : 47944851 } and { : 49876203 } took 17147 ms. between { : 49876203 } and { : 51170454 } took 23866 ms. between { : 44283041 } and { : 45860418 } took 18224 ms. between { : 52417803 } and { : 53762457 } took 21154 ms. The indexes surely fit in memory. The whole DB does! > db.answers.stats() , , > db.stats() , }, |
| Comments |
| Comment by Eliot Horowitz (Inactive) [ 13/Sep/10 ] |
|
They won't block. |
| Comment by Sergei Tulentsev [ 13/Sep/10 ] |
|
Here's a question: after splitVector yielded, what happens to writes that affect records in the key range? Will they wait in the queue, not affecting other queries? Or will they block? |
| Comment by Eliot Horowitz (Inactive) [ 13/Sep/10 ] |
|
Made splitVector yield - so even when it takes a long time - won't block. |
| Comment by auto [ 13/Sep/10 ] |
|
Author: {'login': 'erh', 'name': 'Eliot Horowitz', 'email': 'eliot@10gen.com'}Message: yield in splitVector so even if it takes a while won't block |
| Comment by Sergei Tulentsev [ 13/Sep/10 ] |
|
That file doesn't even show up in the project tree. Interesting. And here is the new data from the field. 4 splits in a row. Mon Sep 13 19:17:12 [conn15] Finding the split vector for pravdorub_production.answers over { uiq: 1.0 } keyCount: 845625 numSplits: 4 took 7238 ms. , min: { uiq: 22297203 }, max: { uiq: 22764634 }, maxChunkSizeBytes: 209715200 } reslen:137 7239ms Mon Sep 13 19:17:21 [conn15] Finding the split vector for pravdorub_production.answers over { uiq: 1.0 } keyCount: 845625 numSplits: 4 took 5989 ms. , min: { uiq: 25036348 }, max: { uiq: 25662603 }, maxChunkSizeBytes: 209715200 } reslen:137 5990ms Mon Sep 13 19:17:28 [conn15] Finding the split vector for pravdorub_production.answers over { uiq: 1.0 } keyCount: 845625 numSplits: 4 took 5955 ms. , min: { uiq: 25662603 }, max: { uiq: 26296816 }, maxChunkSizeBytes: 209715200 } reslen:137 5955ms Mon Sep 13 19:17:39 [conn15] Finding the split vector for pravdorub_production.answers over { uiq: 1.0 } keyCount: 845625 numSplits: 4 took 10032 ms. , min: { uiq: 26296816 }, max: { uiq: 26814034 }, maxChunkSizeBytes: 209715200 } reslen:137 10033ms On another shard, however, it's a bit worse. keyCount: 845625 numSplits: 4 took 73671 ms. , min: { uiq: 21323400 }, max: { uiq: 21776428 }, maxChunkSizeBytes: 209715200 } reslen:137 74327ms Just to remind: this is happening under heavy write load. One mongos is mongoimporting data as fast as he can. |
| Comment by Eliot Horowitz (Inactive) [ 13/Sep/10 ] |
|
xcode... erh@erh-tm1 ~/work/mongo -> git grep "doing delete inline" |
| Comment by Sergei Tulentsev [ 13/Sep/10 ] |
|
Nope, mongos didn't get upgraded. By the way, I tried searching in the sources by the message text, like "Finding median" or "doing delete inline" and founc nothing. Either my XCode can't search or these strings are not there. Which is it? |
| Comment by Eliot Horowitz (Inactive) [ 13/Sep/10 ] |
|
Did you upgrade mongos and mongod? |
| Comment by Sergei Tulentsev [ 13/Sep/10 ] |
|
Yes, I saw your commit. sergio@cs2607:~$ mongod --version |
| Comment by Eliot Horowitz (Inactive) [ 13/Sep/10 ] |
|
Can you send the git hash of the version you're running. |
| Comment by Sergei Tulentsev [ 13/Sep/10 ] |
|
There's nothing new in logs from the last nightly. I guess, your change didn't make it in. Mon Sep 13 16:39:57 [conn6] Finding median for index: { uiq: 1.0 } between { : 51170454 } and { : 52417803 } took 29713 ms. , min: { uiq: 51170454 }, max: { uiq: 52417803 } } reslen:75 29743ms When you test it on your laptop, do you have some other activity going on? Like heavy writing? I don't really mind if it takes a minute to find median key. I just hate that it blocks the whole thing. We definitely need some better granularity for locks. |
| Comment by Eliot Horowitz (Inactive) [ 13/Sep/10 ] |
|
Counted b-trees are a possibility - but a long term thing to look at. Your case still doesn't make sense because on my laptop - i can scan the equivalent number of index nodes in < 2 secs consistently. Can you try last night's nightly? The change above should tell us what the problem is even if it doesn't fix it. |
| Comment by Hans Meiser [ 13/Sep/10 ] |
|
I guess the main problem is the search for the media. In d_split a complete index scan is made (exactly 1 + 1/2) to find the median. |
| Comment by auto [ 12/Sep/10 ] |
|
Author: {'login': 'erh', 'name': 'Eliot Horowitz', 'email': 'eliot@10gen.com'}Message: use splitVector instead of datasize and mediankey. 1 command vs. 2 and should be faster and more extensible. |