[SERVER-3177] Massive Performance hit on index creation when doing a db.copyDatabase or a mongorestore on a server with a replicated set configuration Created: 01/Jun/11  Updated: 30/Mar/12  Resolved: 08/Jun/11

Status: Closed
Project: Core Server
Component/s: Index Maintenance, Replication
Affects Version/s: 1.8.1
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Paul D. Walker Assignee: Aaron Staple
Resolution: Cannot Reproduce Votes: 0
Labels: indexing, performance, replication
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Ubuntu server 10.04.2 LTS
12 GB ram
2 x raid 1 volumes, data on one volume, journal on another.
2 x 6 core 2.9 GHz cpu with hyperthreading (24 logical cpus)


Operating System: Linux
Participants:

 Description   

When restoring a large collection on a mongodb server with a replicated set configuration, index creation becomes incredibly slow. In our case, the collection index creation could take anywhere from between 20 minutes to over half an hour

The exact same collection restored to a test mongodb server running inside a virtual machine with much smaller specs, but with no replication sets, the same indexes are created in 5 to 6 seconds.

Examing the log files, we can see the following:

// log file from big server, replication sets, secondary server on
Wed Jun 1 03:37:31 [conn16] building new index on

Unknown macro: { apps.aid}

for cassia.snapshots_2011053100_01
239000/326021 73%
259900/326021 79%
268900/326021 82%
277600/326021 85%
287400/326021 88%
289600/326021 88%
300000/326021 92%
310000/326021 95%
311900/326021 95%
314200/326021 96%
316500/326021 97%
318400/326021 97%
320500/326021 98%
322600/326021 98%
324900/326021 99%
Wed Jun 1 03:40:28 [conn16] done building bottom layer, going to commit
Wed Jun 1 03:40:37 [conn16] done for 1824 records 185.837secs
Wed Jun 1 03:40:37 [conn16] insert cassia.system.indexes 185837ms
Wed Jun 1 03:40:37 [dur] lsn set 79905899
Wed Jun 1 03:40:37 [conn16] building new index on

Unknown macro: { cnt}

for cassia.snapshots_2011053100_02
Wed Jun 1 03:40:39 [conn16] done for 1823 records 2.376secs
Wed Jun 1 03:40:39 [conn16] insert cassia.system.indexes 2376ms
Wed Jun 1 03:40:39 [conn16] building new index on

Unknown macro: { apps.aid}

for cassia.snapshots_2011053100_02
100/1823 5%
200/1823 10%
300/1823 16%
1000/1823 54%
1100/1823 60%
1200/1823 65%
1300/1823 71%
1800/1823 98%
Wed Jun 1 03:54:09 [conn16] external sort used : 170 files in 809 secs
100/325822 0%
126600/325822 38%
128200/325822 39%
129500/325822 39%
130200/325822 39%
131300/325822 40%
131900/325822 40%
132900/325822 40%
133600/325822 41%
134100/325822 41%
134900/325822 41%
135900/325822 41%
136800/325822 41%
137900/325822 42%
138800/325822 42%
139900/325822 42%
140700/325822 43%
141400/325822 43%
141900/325822 43%
142900/325822 43%
143800/325822 44%
144900/325822 44%
145800/325822 44%
146700/325822 45%
148300/325822 45%
149800/325822 45%
150200/325822 46%
305200/325822 93%
Wed Jun 1 03:59:40 [conn16] done building bottom layer, going to commit
Wed Jun 1 03:59:40 [conn16] done for 1823 records 1140.97secs
Wed Jun 1 03:59:40 [conn16] insert cassia.system.indexes 1140969ms
Wed Jun 1 03:59:40 [conn113] getmore cassia.system.namespaces cid:6678415019539336828 getMore: {} bytes:77862 nreturned:1396 1355345ms
Wed Jun 1 03:59:40 [conn99] query cassia.snapshots_2011060101_02 ntoreturn:1 reslen:44018 nscanned:1

Unknown macro: { cnt}

nreturned:1 1143466ms
Wed Jun 1 03:59:40 [dur] lsn set 81033909

The log files from the small server, single mongo, no replicated sets shows the following:

Wed Jun 1 03:53:35 [conn2] building new index on

Unknown macro: { apps.aid}

for cassia.snapshots_2011053100_02
Wed Jun 1 03:53:41 [conn2] done for 1823 records 5.612secs

As a test, I shutdown the secondary server while the restore was happening and the index creation time on the primary server when down to a much more acceptable 5 to 6 seconds.

More datapoints:

  • using the db.currentOp() command shows the index creation as "external sort used" on the primary server with the secondary server active
  • that also appears as "external sort used : 170 files in 809 secs" in the log file
  • after shutting down the secondary server, the "external sort used" goes away
  • "external sorts" never appear in the log files on the same data set in the test database.
  • the slow performance occurs whether I use the mongorestore command or the db.copyDatabase("old collection", "new collection", "old server") command


 Comments   
Comment by Paul D. Walker [ 08/Jun/11 ]

Hi Aaron,

I think I've found the problem.

I was restoring about 600 collections, each collection of generally about 70 MB in size. The total size of the collections >> system memory. When restoring the the database with all those collections, the initial collections all fit in ram, but later on performance started to suffer. I was expecting mongo to free up the memory the least used collections and keep enough to continue fast indexing

I did see a performance increase after shutting down the secondary, but that may have been just coincidence. I would need to test it more fully to be 100% sure that this was really the case, or whether I happened to be indexing one of the smaller collections at that moment.

I've reset up my replicated set configuration and loaded smaller subsets of the collections ( size < system ram ) and there have been no problems with performance.

I'm also going to run a test by loading the collections individually until the db collections size is > system.ram and see if my problems return, or if mongodb "handles" it a little better.

Summary: I think it turns out to be my fault.

Comment by Aaron Staple [ 08/Jun/11 ]

Hi Paul,

We haven't been able to reproduce this issue. Can you supply your mongod logs for the primary and secondary?

Comment by Paul D. Walker [ 08/Jun/11 ]

any other questions or information I can provide?

Comment by Paul D. Walker [ 02/Jun/11 ]

The secondary has an identical hardware configuration to the primary.

Comment by Paul D. Walker [ 02/Jun/11 ]

additional datapoint

The servers are now running and new collections added to the primary are being indexed in memory and quickly:

Thu Jun 2 11:47:50 [conn118] building new index on

{ _id: 1 }

for cassia.snapshots_2011060211_03
Thu Jun 2 11:47:50 [conn118] done for 0 records 0secs
Thu Jun 2 11:47:50 [conn118] building new index on

{ cnt: 1, ty: 1, cat: 1 }

for cassia.snapshots_2011060211_03
Thu Jun 2 11:47:50 [conn118] done for 1 records 0secs
Thu Jun 2 11:47:50 [conn118] building new index on

{ apps.aid: 1 }

for cassia.snapshots_2011060211_03
Thu Jun 2 11:47:50 [conn118] done for 1 records 0.001secs

However the secondary server is still catching up to the primary. It's taking between 5 to 15 minutes for these indexes to be created

Comment by Paul D. Walker [ 02/Jun/11 ]

No, the disks are almost completely idle when this is happening.

(sda contains the mongo journals, sdb contains the data dir)

mdrop@obdb1:~$ iostat -xm 2
Linux 2.6.32-32-generic (obdb2.so1) 06/02/2011 x86_64 (24 CPU)

avg-cpu: %user %nice %system %iowait %steal %idle
0.08 0.00 1.86 0.72 0.00 97.33

Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util
sda 0.08 0.90 0.22 1.76 0.01 0.31 332.08 0.01 5.93 0.89 0.18
sdb 0.00 81.03 2.04 3.15 0.11 0.33 171.25 0.28 54.50 0.61 0.32

avg-cpu: %user %nice %system %iowait %steal %idle
0.10 0.00 5.04 4.86 0.00 90.00

Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util
sda 8.00 0.00 3.00 0.00 0.04 0.00 30.67 0.01 5.00 5.00 1.50
sdb 0.00 2.50 2.00 1.00 0.06 0.01 48.00 0.00 0.00 0.00 0.00

avg-cpu: %user %nice %system %iowait %steal %idle
0.06 0.00 3.30 3.01 0.00 93.64

Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util
sda 5.00 1.50 1.00 1.00 0.02 0.01 34.00 0.01 5.00 5.00 1.00
sdb 0.00 8.50 0.00 0.50 0.00 0.04 144.00 0.01 10.00 10.00 0.50

avg-cpu: %user %nice %system %iowait %steal %idle
0.09 0.00 5.79 1.49 0.00 92.63

Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util
sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdb 0.00 0.00 2.00 0.00 0.06 0.00 58.00 0.00 0.00 0.00 0.00

avg-cpu: %user %nice %system %iowait %steal %idle
0.02 0.00 4.61 0.00 0.00 95.36

Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util
sda 0.00 1.00 0.00 5.00 0.00 0.02 9.60 0.00 0.00 0.00 0.00
sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

avg-cpu: %user %nice %system %iowait %steal %idle
0.02 0.00 4.77 0.00 0.00 95.21

Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util
sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdb 0.00 1.50 2.00 1.00 0.06 0.01 45.33 0.00 0.00 0.00 0.00

Comment by Eliot Horowitz (Inactive) [ 01/Jun/11 ]

Is the disk saturated?
Can you run iostat -x 2 while this is happening?

Comment by Paul D. Walker [ 01/Jun/11 ]

additional datapoints

  • after the database is loaded, new collections of a similar size are indexed quickly in the primary
  • the secondary continues to use external sorts, so is slow catching up
Comment by Paul D. Walker [ 01/Jun/11 ]

If you need any more information for debugging purposes, I'll be happy to provide it.

Generated at Thu Feb 08 03:02:18 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.