[SERVER-13320] foreground index builds are much slower than background on large collections Created: 23/Mar/14 Updated: 11/Jul/16 Resolved: 03/Dec/14 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Index Maintenance |
| Affects Version/s: | 2.4.4, 2.6.0-rc1 |
| Fix Version/s: | 2.6.0 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Nelson Elhage | Assignee: | Ramon Fernandez Marina |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||
| Operating System: | ALL | ||||
| Steps To Reproduce: | Clone https://github.com/nelhage/mongod-tests, and run "index.py". This will start a local mongod and begin inserting records into a test collection, stopping periodically to do both fg and bg index builds and report the timing. (Note that this will use unbounded amounts of storage in /tmp – you can redirect it with TMPDIR=... in the environment). Sample output on 2.4.4: https://nelhage.com/paste/2014-03-20EdWTQGGH |
||||
| Participants: | |||||
| Description |
|
Ostensibly, a main reason for separate foreground/background builds is that foreground index builds are faster, at the cost of blocking the server [1]. However, we have observed both in production and in synthetic benchmarks, that foreground index builds are often much slower on large collections. In a synthetic benchmark, building a trivial index on a collection with about 6M records, each about 1k large, I saw the following numbers:
The "fg=" number is seconds to build an index in the foreground, and "bg=" is for a background build. The requested indexes are identical, and are dropped each time. I don't have hard numbers right now, but experience in production suggests that the difference only gets worse as the collection gets even bigger. [1] http://docs.mongodb.org/manual/tutorial/build-indexes-in-the-background/ |
| Comments |
| Comment by Ramon Fernandez Marina [ 03/Dec/14 ] |
|
Hi nelhage, our reading of your comment from March 23 was that you were still seeing background indexes being faster, and while we were not quite able to reproduce in a test setup we initiated a more formal testing process to get to the bottom of the issue – because foreground indexes should ideally be faster than background indexes. That being said, we can indeed close this issue as "Fixed" (issue was present in 2.4 but is not in 2.6), and update the ticket if the results of the formal testing show any problems or discrepancies. Thanks for your patience with this matter, and thanks for using MongoDB. Regards, |
| Comment by Nelson Elhage [ 03/Dec/14 ] |
|
So my latest understanding is that this issue was resolved in 2.6 by the rewrite of the external sort code, which I thought I expressed in my March 23 comment, but I may not have been sufficiently clear. It's still outstanding in 2.4, but probably not worth backporting an invasive fix for, so I suspect this should just be closed. |
| Comment by Ramon Fernandez Marina [ 03/Dec/14 ] |
|
Hi nelhage, just a quick note to let you know that this ticket is still under investigation, and so far we haven't been able to confirm the behavior you describe. We'll keep the ticket open until testing of this issue is complete. I also see that this ticket was opened against a 2.6.0-rc1 release candidate. Have you had a chance to test a full release and see if the issue still persists? The latest stable release in the 2.6 series is 2.6.5, and 2.6.6 is schedule for the coming weeks. Thanks, |
| Comment by Ramon Fernandez Marina [ 05/Sep/14 ] |
|
nelhage, our measurements so far do not provide any evidence that foreground index builds are slower than background index builds, but we're still investigating the issue in depth. We'll update this ticket when we have results on this investigation. Regards, |
| Comment by Victor Hooi [ 26/Mar/14 ] |
|
Hi Nelson, Just letting you know - we're getting some conflicting results, and are still looking into this. Also, I saw your last comment about 2.4.x I'll update this ticket when we have something conclusive. Cheers, |
| Comment by Nelson Elhage [ 24/Mar/14 ] |
|
Ah-ha. Spent a while looking at the source during between the last two comments, now I understand what's going on. On 2.4, the external sort divides the data into a number of chunks, each of which fits in RAM, dumps each to a separate file, and does a naïve merge, which will be O(n²) if the data to be sorted is much larger than RAM. On 2.6, the code has been updated to replace the naïve merge with a min-heap, which should have much better asymptotics. So there's still some question of why my tests show the 2.6 fg sort as slightly slower than bg, but at least the most egregious problem is resolved in 2.6. Feel free to close this as already resolved or wontfix if the small performance difference is considered uninteresting. |
| Comment by Nelson Elhage [ 24/Mar/14 ] |
|
Hm – looking at my results again, I hadn't looked at the 2.6 numbers closely enough – fg is still worse, but not nearly as dramatically. It's possible the really bad root issues are resolved there... |
| Comment by Nelson Elhage [ 24/Mar/14 ] |
|
The test runs I pasted output from are on ec2 m1.large instances, which have 7.5G of RAM and 2x 2.27GHz Xeons. I ran the tests against the instance storage, which is essentially local disk. I saw similar results in tests on my laptop, which is a Thinkpad with an SSD, 8G of RAM, and a recent Intel Core processor (I can be more specific once I'm back in the office). I don't have log files saved from test runs. I'm happy to provide them – which mongo version would be most useful for you? The test environments I'm doing synthetic benchmarks on are not monitored by mms. This investigation was triggered on our end by several index builds that took days on a secondary, and some of those nodes are monitored by mms. For instance, https://mms.mongodb.com/host/detail/4f14ac99dc514c3e10871e49/e9ddc87371c836ac7f78cf923344604f spent most of the time from March 12 to March 18 doing index builds. |
| Comment by Victor Hooi [ 24/Mar/14 ] |
|
Hi Nelson, Thank you for the detailed report, and the Github test script. I'm currently attempting to reproduce this on my system, and would appreciate if you could provide some additional information. Firstly - are you able to provide any more details about your environment? (e.g. CPU, RAM, etc.) And if these machines are monitored on our cloud MMS, are you able to provide a link to them please? Secondly, are you able to provide the logfiles during one of the test runs please? We'd like to check for any MongoDB anomalies during the index builds. Thanks, |