[SERVER-24935] Fassert due to WT_CACHE_FULL in failed index build cleanup on inMemory engine Created: 07/Jul/16  Updated: 22/Sep/16  Resolved: 14/Jul/16

Status: Closed
Project: Core Server
Component/s: Index Maintenance, Storage
Affects Version/s: None
Fix Version/s: 3.3.10

Type: Bug Priority: Major - P3
Reporter: David Hows Assignee: David Hows
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Related
is related to SERVER-26239 Improve handling of WT_CACHE_FULL for... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Participants:
Linked BF Score: 0

 Description   

In testing we have seen the following failure in the cleanup of a failed index build:

[js_test:inmem_full] 2016-06-30T20:59:37.091+0000 d20510| 2016-06-30T20:59:37.090+0000 I INDEX    [conn1] build index on: test.large properties: { v: 1, key: { a: 1.0 }, name: "a_1", ns: "test.large" }
[js_test:inmem_full] 2016-06-30T20:59:37.091+0000 d20510| 2016-06-30T20:59:37.090+0000 I INDEX    [conn1] 	 building index using bulk method
[js_test:inmem_full] 2016-06-30T20:59:41.289+0000 d20510| 2016-06-30T20:59:41.289+0000 E INDEX    [conn1] Caught exception while cleaning up partially built indexes: -31807: WT_CACHE_FULL: operation would overflow cache
[js_test:inmem_full] 2016-06-30T20:59:41.290+0000 d20510| 2016-06-30T20:59:41.289+0000 I -        [conn1] Fatal Assertion 18644
[js_test:inmem_full] 2016-06-30T20:59:41.290+0000 d20510| 2016-06-30T20:59:41.289+0000 I -        [conn1]
[js_test:inmem_full] 2016-06-30T20:59:41.290+0000 d20510|
[js_test:inmem_full] 2016-06-30T20:59:41.290+0000 d20510| ***aborting after fassert() failure
[js_test:inmem_full] 2016-06-30T20:59:41.291+0000 d20510|
[js_test:inmem_full] 2016-06-30T20:59:41.291+0000 d20510|
[js_test:inmem_full] 2016-06-30T20:59:41.297+0000 d20510| 2016-06-30T20:59:41.297+0000 F -        [conn1] Got signal: 6 (Aborted).

I've tested attempting to reproduce and was unable to get the system to fail in this manner again.

Looking at the code, I believe that we could handle this error case slightly better and retry the failed cleanup instead of just issuing an fassert.



 Comments   
Comment by Githook User [ 14/Jul/16 ]

Author:

{u'username': u'daveh86', u'name': u'David Hows', u'email': u'howsdav@gmail.com'}

Message: SERVER-24935 - Retry failed index builds upon receiving WT_CACHE_FULL
Branch: master
https://github.com/mongodb/mongo/commit/52a9a5fabf9b70b90f98bc0908d3c04b2b98e059

Comment by David Hows [ 07/Jul/16 ]

Okay, finally got this reproducing.

Run this command:

./mongo --nodb inmem_test.js

With this script

// SERVER-22599 Test behavior of in-memory storage engine with full cache.
(function() {
    'use strict';
 
    Random.setRandomSeed();
 
    // Return array of approximately 1kB worth of random numbers.
    function randomArray() {
        var arr = [];
        for (var j = 0; j < 85; j++)
            arr[j] = Random.rand();
        return arr;
    }
 
    // Return a document of approximately 10kB in size with arrays of random numbers.
    function randomDoc() {
        var doc = {};
        for (var c of "abcdefghij")
            doc[c] = randomArray();
        return doc;
    }
 
    // Return an array with random documents totalling about 1Mb.
    function randomBatch(batchSize) {
        var batch = [];
        for (var j = 0; j < batchSize; j++)
            batch[j] = randomDoc();
        return batch;
    }
 
    const cacheMB = 20;
    const cacheKB = 1024 * cacheMB;
    const docSizeKB = Object.bsonsize(randomDoc()) / 1024;
    const batchSize = 100;
    const batch = randomBatch(batchSize);
    var mongod = MongoRunner.runMongod({
        storageEngine: 'inMemory',
        inMemoryEngineConfigString: 'cache_size=' + cacheMB + "M,",
    });
    assert.neq(null, mongod, "mongod failed to started up with --inMemoryEngineConfigString");
    var db = mongod.getDB("test");
    var t = db.large;
 
    // Insert documents until full.
    var res;
    var count = 0;
    for (var j = 0; j < 1000; j++) {
        res = t.insert(batch);
        assert.gte(res.nInserted, 0, tojson(res));
        count += res.nInserted;
        if (res.hasErrors())
            break;
        assert.eq(res.nInserted, batchSize, tojson(res));
        print("Inserted " + count + " documents");
    }
 
    // Indexes are sufficiently large that it should be impossible to add a new one.
    while(true) {
        assert.commandFailedWithCode(t.createIndex({a: 1}), ErrorCodes.ExceededMemoryLimit);
        assert.commandFailedWithCode(t.createIndex({a: 1, _id:1}), ErrorCodes.ExceededMemoryLimit);
    }
 
}());

Generated at Thu Feb 08 04:07:49 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.