[SERVER-16922] Slow index build times on secondaries with Wired Tiger Created: 17/Jan/15  Updated: 03/Dec/21  Resolved: 22/Jan/15

Status: Closed
Project: Core Server
Component/s: Replication, WiredTiger
Affects Version/s: None
Fix Version/s: 3.0.0-rc6

Type: Bug Priority: Major - P3
Reporter: John Page Assignee: Atul Kachru
Resolution: Done Votes: 1
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File POCDriver.jar     Java Source File jmongoiibench.java     Text File mongod.log     File run.simple.bash    
Issue Links:
Tested
Backwards Compatibility: Fully Compatible
Participants:

 Description   

Testing Migration to Wired Tiger from MMAP0 in 2.6 by adding a WT Replica.

Servers:
Data Volume - 1 TB
Servers Amazon C4.2XLarge
8 x CPU
15GB RAM
Disks 2 x 1TB SSD EBS with 4000PIOS in Raid 0 (8000 PIOPS)

No load on source (primary) server
522,771,027 Documents in collection

Data Copying Phase:
Began: 08:15 16th Jan
Finished 18:23 16th Jan

Creation of _id Index (_id is standard OID's)

2015-01-16T18:23:17.006+0000 I - [rsSync] Index Build: 123300/522771027 0%

Currently

2015-01-17T19:09:58.004+0000 I - [rsSync] Index Build: 150625100/522771027 28%

Not only over 24 hours for 28% but observably taking 4 minutes longer per 1%

At 396 minutes for the final 1% Total predicted time is

396/2 * 100 = 330 Hours to build one index.

By comparison - I tested 2.6 on the same hardware and this step took 7hrs 36 minutes.

Is this the case for all WT index creations - is time to build polynomial (and greater than n log n I think) with number of records - Is this just increasing indexes or all indexes. If I had 3 or 4 secondary indexes on this are they built sequentially or in parallel?



 Comments   
Comment by Mark Callaghan [ 23/Jan/15 ]

I get the problem using these files to load the database and then creating the indexes via the following:

db.purchases_index.ensureIndex(

{"price":1, "customerid":1}

,

{"background":false}

);
db.purchases_index.ensureIndex(

{"cashregisterid" : 1, "price" : 1, "customerid" : 1}

,

{"background":false}

);
db.purchases_index.ensureIndex(

{"price" : 1, "dateandtime" : 1, "customerid" : 1}

,

{"background":false}

);

db.purchases_index.ensureIndex(

{"price":1, "customerid":1}

,

{"background":true}

);
db.purchases_index.ensureIndex(

{"cashregisterid" : 1, "price" : 1, "customerid" : 1}

,

{"background":true}

);
db.purchases_index.ensureIndex(

{"price" : 1, "dateandtime" : 1, "customerid" : 1}

,

{"background":true}

);

WiredTiger was used via:
--storageEngine wiredTiger --wiredTigerCacheSizeGB 4

And mongo.conf had:
processManagement:
fork: true
systemLog:
destination: file
path: /data/mysql/mongo.280rc4/log
logAppend: true
storage:
syncPeriodSecs: 60
dbPath: /data/mysql/mongo.280rc4/data
journal:
enabled: true
mmapv1:
journal:
commitIntervalMs: 100
storage.wiredTiger.collectionConfig.blockCompressor: snappy
storage.wiredTiger.engineConfig.journalCompressor: snappy

Comment by Mark Callaghan [ 23/Jan/15 ]

With rc4 I get good performance for background:false index builds and lousy performance for background:true With rc5 I get lousy performance for background:false and it runs for so long I have yet to try background:true. The call stack above looks familiar. I didn't need a secondary to reproduce it, just ran with a 4G block cache and a 100G database. The indexes should be ~2G each when done.

Comment by Mark Benvenuto [ 22/Jan/15 ]

Fixed with: https://github.com/mongodb/mongo/commit/876f5f475bf9242c098e6de641aa7ee0982803c2

CAP: Do we need a perf test around index builds on secondaries where the secondary wiredtiger?

Comment by Mark Benvenuto [ 22/Jan/15 ]

Thanks to john.page for the excellent repro. Following his instructions, I was able to reproduce the issue on EC2 C2.2XL.

Fundamentally, we believe issue was how WT handled splits. The good news is that I can no longer repro this with the latest WT import yesterday. (https://github.com/mongodb/mongo/commit/876f5f475bf9242c098e6de641aa7ee0982803c2). I believe it is a result of a fix by keith.bostic in the bt_split.c. In both cases pre and post fix, the maximum tree depth was 3.

Details
This is a perf snapshot against a rc5 build with symbols on Ubuntu 14.04 LTS.

perf record -F 99 --call-graph dwarf -o perf6.data -p `pgrep mongod` – sleep 30

perf report --stdio
# ========
# captured on: Wed Jan 21 21:53:45 2015
# hostname : ip-10-203-204-146
# os release : 3.13.0-36-generic
# perf version : 3.13.11.6
# arch : x86_64
# nrcpus online : 8
# nrcpus avail : 8
# cpudesc : Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz
# cpuid : GenuineIntel,6,62,4
# total memory : 15400880 kB
# cmdline : /usr/lib/linux-tools-3.13.0-36/perf record -F 99 --call-graph dwarf -o perf6.data -p 7502 -- sleep 30
# event : name = cpu-clock, type = 1, config = 0x0, config1 = 0x0, config2 = 0x0, excl_usr = 0, excl_kern = 0, excl_host = 0, excl_guest = 1, precise_ip = 0,
# HEADER_CPU_TOPOLOGY info available, use -I to display
# HEADER_NUMA_TOPOLOGY info available, use -I to display
# pmu mappings: software = 1, tracepoint = 2, breakpoint = 5
# ========
#
# Samples: 2K of event 'cpu-clock'
# Event count (approx.): 28313131030
#
# Overhead  Command       Shared Object                                                                                                  Symbol
# ........  .......  ..................  ......................................................................................................
#
    98.79%   mongod  mongod              [.] __wt_tree_walk
             |
             --- __wt_tree_walk
                 __wt_btcur_next
                 __curfile_next
                 mongo::WiredTigerRecordStore::Iterator::_getNext()

Comment by John Page [ 21/Jan/15 ]

Insert generator

Comment by John Page [ 21/Jan/15 ]

I tested with 540M and 185M documents which is not a large set. Any decent
loader will hit 60,000 per second on map 2.6 so it doesn't take long to
build a set like that. I'll send you details of my hardware and loader
tomorrow but windows 8 won't come close.

Comment by John Page [ 21/Jan/15 ]

I tested with 540M and 185M documents which is not a large set. Any decent
loader will hit 60,000 per second on map 2.6 so it doesn't take long to
build a set like that. I'll send you details of my hardware and loader
tomorrow but windows 8 won't come close.

Comment by John Page [ 21/Jan/15 ]

I tested with 540M and 185M documents which is not a large set. Any decent
loader will hit 60,000 per second on map 2.6 so it doesn't take long to
build a set like that. I'll send you details of my hardware and loader
tomorrow but windows 8 won't come close.

Comment by John Page [ 19/Jan/15 ]

I'm seeing nice fast index build times when testing general index building so reproducing original issue.

Comment by John Page [ 17/Jan/15 ]

I have halted instances buy may have the logs if root is ebs, this is
latest 2.8 rc candidate as of 16th

Comment by Scott Hernandez (Inactive) [ 17/Jan/15 ]

Please include the version and logs for the server (which should include the startup stanza).

Comment by John Page [ 17/Jan/15 ]

My bad, was supposed to be a bug

Generated at Thu Feb 08 03:42:43 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.