[SERVER-16296] Producer-consumer use case shows declining performance over time under WIredTiger Created: 24/Nov/14  Updated: 18/Dec/14  Resolved: 15/Dec/14

Status: Closed
Project: Core Server
Component/s: Performance
Affects Version/s: 2.8.0-rc1
Fix Version/s: 2.8.0-rc3

Type: Bug Priority: Major - P3
Reporter: Bruce Lucas (Inactive) Assignee: Michael Cahill (Inactive)
Resolution: Done Votes: 0
Labels: wiredtiger
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File producer-consumer-graph.png     PNG File producer-consumer-profile-indexed.png     PNG File producer-consumer-profile-unindexed.png    
Issue Links:
Depends
Related
is related to SERVER-16235 Performance issue in capped collectio... Closed
is related to SERVER-16247 Oplog declines in performance over ti... Closed
is related to SERVER-16379 Chunk moves are slow under WiredTiger Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Participants:

 Description   

Producer inserts documents, consumer uses findAndModify to find a document and remove it. Producer throttles insert rate by monitoring document count. See code below.

Performance behavior and profiling shows similarities to SERVER-16247 and SERVER-16235, even though no capped collections are involved in this case.

  • Orange graph shows declining performance over time; appears to be headed to 0. Behavior appears similar to SERVER-16235 after the point where the capped collection had wrapped and deletes began occurring.
  • Red graph shows that adding index on the "worker" field that is used in the findAndModify query changes the behavior but does not fix the problem. Behavior appears similar to SERVER-16247.
  • Green graph shows that LSM is worse.

Profiles for both indexed and unindexed cases below. Similar to SERVER-16247 and SERVER-16235, both show a lot of time spent in __wt_btcur_next. Bar graph timeline shows correlation of time spent there with declining performance as the run progresses, confirming that the time spent in __wt_btcur_next is the reason for the performance decline.

function producer() {
    db.c.drop()
    //db.c.ensureIndex({worker:1})
    var limit = 10000
    var every = 1000
    for (var i=0; i<count; i++) {
        db.c.insert({_id:i})
        if (i>0 && i%every==0) {
            c = db.c.count()
            print('pro', i, db.c.count())
            while (c>limit) {
                sleep(100)
                c = db.c.count()
            }
        }
    }
}
 
function consumer(worker) {
    var every = 1000
    var t = new Date()
    for (var i=0; i<count; i++) {
        doc = db.c.findAndModify({query: {worker: null}, update: {$set: {worker: worker}}})
        if (doc) {
            db.c.remove(doc)
        }  else {
            sleep(100)
        }
        if (i>0 && i%every==0) {
            var tt = new Date()
            print('worker', worker, i, Math.floor(every/(tt-t)*1000))
            t = tt
        }
    }
}


Generated at Thu Feb 08 03:40:36 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.