[SERVER-16379] Chunk moves are slow under WiredTiger Created: 02/Dec/14  Updated: 18/Dec/14  Resolved: 15/Dec/14

Status: Closed
Project: Core Server
Component/s: Storage
Affects Version/s: 2.8.0-rc1
Fix Version/s: 2.8.0-rc3

Type: Bug Priority: Major - P3
Reporter: Bruce Lucas (Inactive) Assignee: Alexander Gorrod
Resolution: Done Votes: 0
Labels: wiredtiger
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File movechunk-comparison.png     PNG File movechunk-profile.png    
Issue Links:
Depends
Related
related to SERVER-16247 Oplog declines in performance over ti... Closed
related to SERVER-16296 Producer-consumer use case shows decl... Closed
Tested
Backwards Compatibility: Fully Compatible
Operating System: ALL
Participants:

 Description   

  • Looks like possibly quadratic behavior.
  • Logs and profile show most of the time is spent in deleting the old chunk, so this seems possibly related to SERVER-16247 and SERVER-16296, and may have to do with having to scan past deleted btree entries. However this case is a little different: the profile below shows this is related to deletes from the index, and the test is constructed so that these deletes are not from the beginning of the index btree.
  • The deletes also make some other operations, specifically a lookup of a non-existent _id, slow, and they remain slow after the chunk delete completes.

  • Graph at top graph shows rate of repeated lookups of a non-existent _id that was continually running throughout the test.
  • (1) moveChunk begins
  • (2) data has been copied to other shard, now deletion begins on this shard
    • the performance of the lookups of the non-existent document (top graph) begin to decline.
    • (A) deleteRange is spending most of its time in __wt_btcur_next, similar to SERVER-16296, but in this case inside IndexCursor::locate.
    • (B) the lookups of the non-existent _id increasingly also spend their time in __wt_btcur_next as those lookups get slow, also inside IndexCursor::locate.
  • (3) the deletions have finished, but the performance of the lookups (top graph) remains low.

Instructions to reproduce:

Start mongod/mongos:

mongod --dbpath ./s0r0 --logpath ./s0r0.log --logappend --port 30000  --storageEngine wiredtiger --fork
mongod --dbpath ./s1r0 --logpath ./s1r0.log --logappend --port 31000  --storageEngine wiredtiger --fork
mongod --configsvr --dbpath db/c --logpath db/c.log --logappend --fork
mongos --port 27017 --configdb localhost --logpath db/ms0.log --logappend  --noAutoSplit --fork

Set up:

    sh.addShard('localhost:30000')
    sh.addShard('localhost:31000')
 
    // turn on sharding for collection test.c
    printjson(['enableSharding', sh.enableSharding('test')])
    printjson(['shardCollection', sh.shardCollection('test.c', {_id: 1})])
 
    // we'll do the chunk moving manually
    printjson(['setBalancerState', sh.setBalancerState(false)])
 
    // manually create two chunks: _id<0 and _id>=0
    printjson(['split', db.adminCommand({split: 'test.c', middle: {_id:0}})])

Start an asynchronous thread continuously searching for non-existing document with _id:-5. This will get slower once the deletes for the chunk move start.

function find() {
    every = 1000
    var t = new Date()
    for (var i=0; ; i++) {
        db.c.find({_id:-5}).count() // search for a non-existent document
        if (i%every==0 && i>0) {
            var tt = new Date()
            print(new Date(), 'find', i, Math.floor(every / (tt-t) * 1000))
            t = tt
        }
    }
}

Here's the main part of the test. Insert one document into chunk _id<0, and lots of documents in chunk _id>=0, then move the large chunk from shard0000, where it starts out, to shard0001. By inserting a document with _id:-10 we ensure that the deletes of the chunk _id>=0 when we move it are not from the beginning of the _id index (nor from the beginning of the collection). Profile above was obtained with count of 70000.

function test(count) {
 
    // insert one document in chunk _id<0
    // this ensures deletes when we move the other chunk below
    // are not from beginning of _id index, nor beginning of collection
    db.c.insert({_id:-10})
 
    // insert lots of documents in chunk _id>=0
    print('=== chunk insert')
    every = 10000
    var t = new Date()
    for (var i=0; i<count; ) {
        var bulk = db.c.initializeUnorderedBulkOp();
        for (var j=0; j<every && i<count; j++, i++)
            bulk.insert({_id:i})
        bulk.execute();
        tt = new Date()
        print(new Date(), 'insert', i, Math.floor(every / (tt-t) * 1000))
        t = tt
    }
    info('test.c')
 
    // move the chunk _id>0 now from shard0000 (where it starts out) to shard0001
    print('=== chunk move')
    printjson(db.adminCommand({moveChunk:'test.c', find:{_id:0}, to:'shard0001', _waitForDelete:true}))
    info('test.c')
}
 
function info(ns) {
    db.getSiblingDB("config").chunks.find({"ns" : ns}).sort({min:1}).forEach(function (chunk) { 
        var db1 = db.getSiblingDB(chunk.ns.split(".")[0])
        var key = db.getSiblingDB("config").collections.findOne({_id:chunk.ns}).key
        var size = db1.runCommand({datasize:chunk.ns, keyPattern:key, min:chunk.min, max:chunk.max, estimate:true});
        print('id', chunk._id, 'shard', chunk.shard, 'min', chunk.min._id, 'max', chunk.max._id, 'size', size.size, 'objects', size.numObjects)
    })
}


Generated at Thu Feb 08 03:40:52 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.