[SERVER-3177] Massive Performance hit on index creation when doing a db.copyDatabase or a mongorestore on a server with a replicated set configuration Created: 01/Jun/11 Updated: 30/Mar/12 Resolved: 08/Jun/11 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Index Maintenance, Replication |
| Affects Version/s: | 1.8.1 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Paul D. Walker | Assignee: | Aaron Staple |
| Resolution: | Cannot Reproduce | Votes: | 0 |
| Labels: | indexing, performance, replication | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
Ubuntu server 10.04.2 LTS |
||
| Operating System: | Linux |
| Participants: |
| Description |
|
When restoring a large collection on a mongodb server with a replicated set configuration, index creation becomes incredibly slow. In our case, the collection index creation could take anywhere from between 20 minutes to over half an hour The exact same collection restored to a test mongodb server running inside a virtual machine with much smaller specs, but with no replication sets, the same indexes are created in 5 to 6 seconds. Examing the log files, we can see the following:
The log files from the small server, single mongo, no replicated sets shows the following:
As a test, I shutdown the secondary server while the restore was happening and the index creation time on the primary server when down to a much more acceptable 5 to 6 seconds. More datapoints:
|
| Comments |
| Comment by Paul D. Walker [ 08/Jun/11 ] |
|
Hi Aaron, I think I've found the problem. I was restoring about 600 collections, each collection of generally about 70 MB in size. The total size of the collections >> system memory. When restoring the the database with all those collections, the initial collections all fit in ram, but later on performance started to suffer. I was expecting mongo to free up the memory the least used collections and keep enough to continue fast indexing I did see a performance increase after shutting down the secondary, but that may have been just coincidence. I would need to test it more fully to be 100% sure that this was really the case, or whether I happened to be indexing one of the smaller collections at that moment. I've reset up my replicated set configuration and loaded smaller subsets of the collections ( size < system ram ) and there have been no problems with performance. I'm also going to run a test by loading the collections individually until the db collections size is > system.ram and see if my problems return, or if mongodb "handles" it a little better. Summary: I think it turns out to be my fault. |
| Comment by Aaron Staple [ 08/Jun/11 ] |
|
Hi Paul, We haven't been able to reproduce this issue. Can you supply your mongod logs for the primary and secondary? |
| Comment by Paul D. Walker [ 08/Jun/11 ] |
|
any other questions or information I can provide? |
| Comment by Paul D. Walker [ 02/Jun/11 ] |
|
The secondary has an identical hardware configuration to the primary. |
| Comment by Paul D. Walker [ 02/Jun/11 ] |
|
additional datapoint The servers are now running and new collections added to the primary are being indexed in memory and quickly: Thu Jun 2 11:47:50 [conn118] building new index on { _id: 1 } for cassia.snapshots_2011060211_03 for cassia.snapshots_2011060211_03 for cassia.snapshots_2011060211_03 However the secondary server is still catching up to the primary. It's taking between 5 to 15 minutes for these indexes to be created |
| Comment by Paul D. Walker [ 02/Jun/11 ] |
|
No, the disks are almost completely idle when this is happening. (sda contains the mongo journals, sdb contains the data dir) mdrop@obdb1:~$ iostat -xm 2 avg-cpu: %user %nice %system %iowait %steal %idle Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util avg-cpu: %user %nice %system %iowait %steal %idle Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util avg-cpu: %user %nice %system %iowait %steal %idle Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util avg-cpu: %user %nice %system %iowait %steal %idle Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util avg-cpu: %user %nice %system %iowait %steal %idle Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util avg-cpu: %user %nice %system %iowait %steal %idle Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util |
| Comment by Eliot Horowitz (Inactive) [ 01/Jun/11 ] |
|
Is the disk saturated? |
| Comment by Paul D. Walker [ 01/Jun/11 ] |
|
additional datapoints
|
| Comment by Paul D. Walker [ 01/Jun/11 ] |
|
If you need any more information for debugging purposes, I'll be happy to provide it. |