[SERVER-41824] Collection creation becomes very slow and has extended stalls Created: 19/Jun/19  Updated: 28/Jun/19  Resolved: 26/Jun/19

Status: Closed
Project: Core Server
Component/s: Storage
Affects Version/s: 4.2.0-rc1
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Bruce Lucas (Inactive) Assignee: Michael Cahill (Inactive)
Resolution: Done Votes: 0
Labels: KS, bkp
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File comparison.png    
Issue Links:
Backports
Depends
depends on WT-4882 Improve checkpoint performance when t... Closed
Operating System: ALL
Backport Requested:
v4.2
Sprint: Storage Engines 2019-07-19
Participants:

 Description   

Test creates 15k collections as fast as possible. 4.0.10 is on the left, 4.2.0-rc1 on the right.

  • 4.0.10 (B-E)
    • overall creating about 150 collections per second
    • C-D: rate slows down during checkpoint, but no stalls
  • 4.2.0-rc1 (F-J)
    • F-G: initial rate is higher
    • G: after a few seconds collection creation rate drops to ~3/s, accompanined by a high rate of failed evictions and an extremely high rate of bytes written, ~200 MB/s.
    • H: shortly after the checkpoint starts we see an extended stall of ~100 s, during which no FTDC data is collected.
    • I: stall ends, but checkpoint continues and very low rate of collection creation and very high rate of bytes written continues (possibly indefinitely?).

function repro() {
    nthreads = 10
    threads = []
    for (var t=0; t<nthreads; t++) {
        thread = new ScopedThread(function(t) {
            for (var i=0; i<1500; i++) {
                if (t==0 && i%100==0)
                    print(i)
                db['c' + t + '.' + i].insert({})
            }
        }, t)
        threads.push(thread)
        thread.start()
    }
    for (var t = 0; t < nthreads; t++)
        threads[t].join()
}



 Comments   
Comment by Michael Cahill (Inactive) [ 26/Jun/19 ]

The issue here was improvements in MongoDB that allow more concurrency between metadata operations. This meant that under load, multiple threads could be attempting to update WiredTiger metadata pages concurrently, which exposed a long standing issue that prevented metadata pages from being evicted and splitting normally. That was fixed in WT-4882.

Generated at Thu Feb 08 04:58:46 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.