[SERVER-16717] unexpected WriteConflict exceptions with WiredTiger b-tree Created: 05/Jan/15  Updated: 23/Jan/15  Resolved: 22/Jan/15

Status: Closed
Project: Core Server
Component/s: Storage
Affects Version/s: 2.8.0-rc4
Fix Version/s: 3.0.0-rc6

Type: Bug Priority: Major - P3
Reporter: Mark Callaghan Assignee: Michael Cahill (Inactive)
Resolution: Done Votes: 0
Labels: wiredtiger
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Backwards Compatibility: Fully Compatible
Operating System: ALL
Steps To Reproduce:

Start mongod:
numactl --interleave=all bin/mongod --config /data/mysql/mongo.280rc4/mongo.conf --storageEngine wiredTiger --wiredTigerCacheSizeGB 2

Run iibench (https://github.com/tmcallaghan/iibench-mongodb) with a few changes to run.simple.bash:
export MAX_ROWS=400000000
export LENGTH_CHAR_FIELDS=2000

I also edited jmongoiibench.java to increase the client timeout to avoid errors on checkpoint stalls.

I used this mongo.conf:
processManagement:
fork: true
systemLog:
destination: file
path: /data/mysql/mongo.280rc4/log
logAppend: true
storage:
dbPath: /data/mysql/mongo.280rc4/data
journal:
enabled: true
mmapv1:
syncPeriodSecs: 60
journal:
commitIntervalMs: 100
storage.wiredTiger.collectionConfig.blockCompressor: zlib
storage.wiredTiger.engineConfig.journalCompressor: snappy
operationProfiling.slowOpThresholdMs: 2000

Participants:

 Description   

This occurs when running iibench with the WiredTiger b-tree. This does not occur with mmapv1, wiredtiger LSM, tokumx or rocksdb engines for MongoDB. While I won't rule out the client code, it looks like a WT bug to me.

It seems to occur around the time of a checkpoint stall.

The iibench client prints this on an error:
com.mongodb.WriteConcernException:

{ "serverUsed" : "localhost:27017" , "ok" : 1 , "n" : 0 , "err" : "WriteConflict" , "code" : 112}

And the mongod error log has:
2015-01-04T21:49:45.625-0800 I QUERY [conn10] command iibench.$cmd command: insert

{ insert: "purchases_index", ordered: true, documents: 1000 }

keyUpdates:0 reslen:40 12929ms
2015-01-04T21:49:45.654-0800 I QUERY [conn4] command iibench.$cmd command: insert

{ insert: "purchases_index", ordered: true, documents: 1000 }

keyUpdates:0 reslen:40 12894ms
2015-01-04T21:49:45.658-0800 I QUERY [conn2] command iibench.$cmd command: insert

{ insert: "purchases_index", ordered: true, documents: 1000 }

keyUpdates:0 reslen:40 12907ms
2015-01-04T21:49:53.033-0800 I WRITE [conn9] insert iibench.purchases_index query:

{ _id: ObjectId('54aa25fce4b0fe6bcefcf0d2'), dateandtime: 1420436988731, cashregisterid: 819, customerid: 77578, productid: 8110, price: 780.2229508580186, cf1: "qzafxxbyqkjakijsqouvvzrzhxathevjazprnmeofdznmupmifarnmmsipsbrbjjkokfidkivxtxmwvljcxzbgnovrkuxoddsddqocnvfooeugujzfopvcytzkouesykevvfygrpqteftjrzvkgojy..." }

ninserted:0 keyUpdates:0 exception: WriteConflict code:112 4204ms

This also occurs when I use a small cache for WT which should make checkpoint stalls less of an issue.



 Comments   
Comment by Eliot Horowitz (Inactive) [ 22/Jan/15 ]

Cause by SERVER-16337

Comment by Mark Callaghan [ 16/Jan/15 ]

Is this code the source of the duplicate key error?
https://github.com/mongodb/mongo/blob/master/src/mongo/db/storage/wiredtiger/wiredtiger_index.cpp#L1012

Comment by Mark Callaghan [ 16/Jan/15 ]

The pattern is first I get the bogus write conflict exception and then I get a duplicate key exception. I don't see how the duplicate key exception can occur give that the collection has 3 non-unique secondary indexes and doesn't set the _id on insert. See https://github.com/tmcallaghan/iibench-mongodb for the code.

2015-01-16T09:15:01.099-0800 I WRITE    [conn3] insert iibench.purchases_index query: { _id: ObjectId('54b94710e4b064e9da60b135'), dateandtime: 1421428496073, cashregisterid: 803, customerid: 85356, productid: 7926, price: 856.65128800982, cf1: "yyvtjbtiprhyzyjqpidjduaadthlxgygbxamxshxkbfndglluczjredogtpgcmlvklujzqkdwmmqgbbkdortfkmapzmmrahhobtznefbsdekgfwjlloghyydonkgukvszkxejzneubxdcgbsvbegcf..." } ninserted:0 keyUpdates:0 writeConflicts:0 exception: WriteConflict code:112 numYields:0 5025ms
2015-01-16T09:15:01.099-0800 I COMMAND  [conn3] command iibench.$cmd command: insert { insert: "purchases_index", ordered: true, documents: 10 } keyUpdates:0 writeConflicts:0 numYields:0 reslen:113 5026ms
2015-01-16T09:15:06.265-0800 I WRITE    [conn10] insert iibench.purchases_index query: { _id: ObjectId('54b94710e4b064e9da60b111'), dateandtime: 1421428496073, cashregisterid: 229, customerid: 49757, productid: 8790, price: 498.0298568663678, cf1: "czhpvbxyjxkrfgswnqhkhvktyrvfrtnaitvianaqgnsicaarojqpptdflgqdslrjzkidebirgkljqxdecurxgkdzyvxbhtgqfafwaapyuuyvkaciwgiwmqyvrctqyvxwmtpibenzlqdvzpnbomaljq..." } ninserted:0 keyUpdates:0 writeConflicts:0 exception: WriteConflict code:112 numYields:0 10190ms
2015-01-16T09:15:06.265-0800 I COMMAND  [conn10] command iibench.$cmd command: insert { insert: "purchases_index", ordered: true, documents: 10 } keyUpdates:0 writeConflicts:0 numYields:0 reslen:113 10191ms
2015-01-16T09:15:11.399-0800 I WRITE    [conn11] insert iibench.purchases_index query: { _id: ObjectId('54b94710e4b064e9da60b160'), dateandtime: 1421428496073, cashregisterid: 427, customerid: 46935, productid: 5710, price: 469.487788970053, cf1: "ispucbiumqtsrwkpmybqfaclfibowvxhpegnuemvzedpdtljdryqmhunixkoxstzuqnkjaonbcuaqasyvzrlpgydrvhaskntncosnrrlbnaznvamijcatsdxldkensyoxsdmobazeealxprasbqghq..." } ninserted:0 keyUpdates:0 writeConflicts:0 exception: E11000 duplicate key error dup key: { : ObjectId('54b94710e4b064e9da60b160') } code:11000 numYields:0 15325ms
2015-01-16T09:15:11.399-0800 I COMMAND  [conn11] command iibench.$cmd command: insert { insert: "purchases_index", ordered: true, documents: 10 } keyUpdates:0 writeConflicts:0 numYields:0 reslen:178 15325ms
2015-01-16T09:15:16.528-0800 I WRITE    [conn5] insert iibench.purchases_index query: { _id: ObjectId('54b94710e4b064e9da60b11b'), dateandtime: 1421428496073, cashregisterid: 300, customerid: 51898, productid: 2750, price: 519.987134848926, cf1: "zwvwdarsmytumdbhunuthdhoncxqbthkkdduovisaaatlflsuldpzuvdyshmcutpsgvgjmfvlzluyqshkgmypecxibufstsirjkwoiswaqrrhwalfpmhxjyjgtyvcjotrkjcitzejbhexksngrywuj..." } ninserted:0 keyUpdates:0 writeConflicts:0 exception: E11000 duplicate key error dup key: { : ObjectId('54b94710e4b064e9da60b11b') } code:11000 numYields:0 20454ms
2015-01-16T09:15:16.528-0800 I COMMAND  [conn5] command iibench.$cmd command: insert { insert: "purchases_index", ordered: true, documents: 10 } keyUpdates:0 writeConflicts:0 numYields:0 reslen:178 20455ms

Code that generates attribute values for new documents.

                    for (int i = 0; i < documentsPerInsert; i++) {
                        //id++;
                        int thisCustomerId = rand.nextInt(numCustomers);
                        double thisPrice= ((rand.nextDouble() * maxPrice) + (double) thisCustomerId) / 100.0;
                        BasicDBObject doc = new BasicDBObject();
                        //doc.put("_id",id);
                        doc.put("dateandtime", System.currentTimeMillis());
                        doc.put("cashregisterid", rand.nextInt(numCashRegisters));
                        doc.put("customerid", thisCustomerId);
                        doc.put("productid", rand.nextInt(numProducts));
                        doc.put("price", thisPrice);
                        for (int charField = 1; charField <= numCharFields; charField++) {
                            int startPosition = rand.nextInt(randomStringLength-lengthCharFields);
                            doc.put("cf"+Integer.toString(charField), randomStringHolder.substring(startPosition,startPosition+numUncompressibleCharacters) + compressibleStringHolder.substring(startPosition,startPosition+numCompressibleCharacters));
                        }
                        aDocs[i]=doc;
                    }

Comment by Mark Callaghan [ 16/Jan/15 ]

Can someone label this as a WT bug?

Comment by Michael Cahill (Inactive) [ 15/Jan/15 ]

Hi mdcallag, thanks for the report and sorry to waste your time. We will add logging for this case.

Comment by Mark Callaghan [ 05/Jan/15 ]

Given the comment about this being a pathological case, can you add logging for it and for other pathological cases to avoid the need to attach gdb?

        /*
         * A pathological case: if we're the oldest transaction in the system
         * and the eviction server is stuck trying to find space, abort the
         * transaction to give up all hazard pointers before trying again.
         */

Comment by Mark Callaghan [ 05/Jan/15 ]

Looking at code, WriteConflict is thrown for WT_ROLLBACK, see wiretiger_util.cpp

    Status wtRCToStatus_slow(int retCode, const char* prefix ) {
        if (retCode == 0)
            return Status::OK();
 
        if ( retCode == WT_ROLLBACK ) {
            throw WriteConflictException();
        }

And then in WiredTiger source

__evict_get_ref(
    WT_SESSION_IMPL *session, int is_app, WT_BTREE **btreep, WT_REF **refp)
{
        WT_CACHE *cache;
        WT_EVICT_ENTRY *evict;
        uint32_t candidates;
        WT_DECL_SPINLOCK_ID(id);                        /* Must appear last */
 
        cache = S2C(session)->cache;
        *btreep = NULL;
        *refp = NULL;
 
        /*
         * A pathological case: if we're the oldest transaction in the system
         * and the eviction server is stuck trying to find space, abort the
         * transaction to give up all hazard pointers before trying again.
         */
        if (is_app && F_ISSET(cache, WT_EVICT_STUCK) &&
            __wt_txn_am_oldest(session)) {
                F_CLR(cache, WT_EVICT_STUCK);
                WT_STAT_FAST_CONN_INCR(session, txn_fail_cache);
                return (WT_ROLLBACK);
        }

And from running with gdb attached, that "return(WT_ROLLBACK)" appears to be the cause

1200    src/third_party/wiredtiger/src/evict/evict_lru.c: No such file or directory.
(gdb) where
#0  __evict_get_ref (refp=<synthetic pointer>, btreep=<synthetic pointer>, is_app=1, session=0x7fffecc09600) at src/third_party/wiredtiger/src/evict/evict_lru.c:1200
#1  __wt_evict_lru_page (session=session@entry=0x7fffecc09600, is_app=is_app@entry=1) at src/third_party/wiredtiger/src/evict/evict_lru.c:1286
#2  0x000000000131352d in __wt_cache_full_check (session=0x7fffecc09600) at src/third_party/wiredtiger/src/include/cache.i:133
#3  __cursor_enter (session=0x7fffecc09600) at src/third_party/wiredtiger/src/include/cursor.i:62
#4  __curfile_enter (cbt=<optimized out>, cbt=<optimized out>) at src/third_party/wiredtiger/src/include/cursor.i:97
#5  __cursor_func_init (reenter=1, cbt=0x7fffe2ff3b00) at src/third_party/wiredtiger/src/include/cursor.i:184
#6  __wt_btcur_insert (cbt=cbt@entry=0x7fffe2ff3b00) at src/third_party/wiredtiger/src/btree/bt_cursor.c:447
#7  0x00000000013495d6 in __curfile_insert (cursor=0x7fffe2ff3b00) at src/third_party/wiredtiger/src/cursor/cur_file.c:211
#8  0x0000000000da685f in mongo::WiredTigerIndexStandard::_insert (this=<optimized out>, c=0x7fffe2ff3b00, key=..., loc=..., dupsAllowed=<optimized out>)
    at src/mongo/db/storage/wiredtiger/wiredtiger_index.cpp:1092
#9  0x0000000000da8646 in mongo::WiredTigerIndex::insert (this=0x7fffe4010960, txn=<optimized out>, key=..., loc=..., dupsAllowed=<optimized out>)
    at src/mongo/db/storage/wiredtiger/wiredtiger_index.cpp:366
#10 0x0000000000ad004b in mongo::BtreeBasedAccessMethod::insert (this=0x7fffe41e59d0, txn=0x7ffff6770660, obj=..., loc=..., options=..., numInserted=0x7ffff676d4b8)
    at src/mongo/db/index/btree_based_access_method.cpp:86
#11 0x000000000097105e in mongo::IndexCatalog::_indexRecord (this=this@entry=0x7fffe40118c8, txn=txn@entry=0x7ffff6770660, index=0x7fffe41b5650, obj=..., loc=...)
    at src/mongo/db/catalog/index_catalog.cpp:1093
#12 0x0000000000971376 in mongo::IndexCatalog::indexRecord (this=this@entry=0x7fffe40118c8, txn=txn@entry=0x7ffff6770660, obj=..., loc=...) at src/mongo/db/catalog/index_catalog.cpp:1125
#13 0x000000000095f780 in mongo::Collection::_insertDocument (this=this@entry=0x7fffe4011800, txn=txn@entry=0x7ffff6770660, docToInsert=..., enforceQuota=enforceQuota@entry=true)
    at src/mongo/db/catalog/collection.cpp:243
#14 0x0000000000960220 in mongo::Collection::insertDocument (this=this@entry=0x7fffe4011800, txn=txn@entry=0x7ffff6770660, docToInsert=..., enforceQuota=enforceQuota@entry=true)
    at src/mongo/db/catalog/collection.cpp:195
#15 0x00000000009fc99e in singleInsert (result=0x7ffff676d7f0, collection=0x7fffe4011800, docToInsert=..., txn=0x7ffff6770660) at src/mongo/db/commands/write_commands/batch_executor.cpp:1111
#16 insertOne (result=0x7ffff676d7f0, state=0x7ffff676edf0) at src/mongo/db/commands/write_commands/batch_executor.cpp:1055
#17 mongo::WriteBatchExecutor::execOneInsert (this=this@entry=0x7ffff676f2b0, state=state@entry=0x7ffff676edf0, error=error@entry=0x7ffff676edd0)
    at src/mongo/db/commands/write_commands/batch_executor.cpp:1083
#18 0x00000000009fd4e8 in mongo::WriteBatchExecutor::execInserts (this=this@entry=0x7ffff676f2b0, request=..., errors=errors@entry=0x7ffff676f0c0)
    at src/mongo/db/commands/write_commands/batch_executor.cpp:874
#19 0x00000000009fec34 in mongo::WriteBatchExecutor::bulkExecute (this=this@entry=0x7ffff676f2b0, request=..., upsertedIds=upsertedIds@entry=0x7ffff676f0e0, errors=errors@entry=0x7ffff676f0c0)
    at src/mongo/db/commands/write_commands/batch_executor.cpp:757
#20 0x00000000009ff335 in mongo::WriteBatchExecutor::executeBatch (this=this@entry=0x7ffff676f2b0, request=..., response=response@entry=0x7ffff676f2f0)
    at src/mongo/db/commands/write_commands/batch_executor.cpp:266
#21 0x0000000000a0130d in mongo::WriteCmd::run (this=<optimized out>, txn=0x7ffff6770660, dbName=..., cmdObj=..., options=<optimized out>, errMsg=..., result=..., fromRepl=false)
    at src/mongo/db/commands/write_commands/write_commands.cpp:144
#22 0x0000000000a1d0a4 in mongo::_execCommand (txn=txn@entry=0x7ffff6770660, c=c@entry=0x7ffff5fa65f0, dbname=..., cmdObj=..., queryOptions=queryOptions@entry=0, errmsg=..., result=..., 
    fromRepl=fromRepl@entry=false) at src/mongo/db/dbcommands.cpp:1252
#23 0x0000000000a1dfe3 in mongo::Command::execCommand (txn=txn@entry=0x7ffff6770660, c=c@entry=0x7ffff5fa65f0, queryOptions=queryOptions@entry=0, cmdns=cmdns@entry=0x7ffece1f9014 "iibench.$cmd", 
    cmdObj=..., result=..., fromRepl=fromRepl@entry=false) at src/mongo/db/dbcommands.cpp:1468
#24 0x0000000000a1ea9b in mongo::_runCommands (txn=txn@entry=0x7ffff6770660, ns=0x7ffece1f9014 "iibench.$cmd", _cmdobj=..., b=..., anObjBuilder=..., fromRepl=fromRepl@entry=false, queryOptions=0)
    at src/mongo/db/dbcommands.cpp:1543
#25 0x0000000000be984a in runCommands (fromRepl=false, queryOptions=<optimized out>, anObjBuilder=..., b=..., curop=..., jsobj=..., ns=<optimized out>, txn=0x7ffff6770660)
    at src/mongo/db/query/find.cpp:131
#26 mongo::runQuery (txn=txn@entry=0x7ffff6770660, m=..., q=..., curop=..., result=..., fromDBDirectClient=fromDBDirectClient@entry=false) at src/mongo/db/query/find.cpp:565
#27 0x0000000000afe595 in receivedQuery (fromDBDirectClient=false, m=..., dbresponse=..., c=..., txn=0x7ffff6770660) at src/mongo/db/instance.cpp:224
#28 mongo::assembleResponse (txn=txn@entry=0x7ffff6770660, m=..., dbresponse=..., remote=..., fromDBDirectClient=fromDBDirectClient@entry=false) at src/mongo/db/instance.cpp:394
#29 0x0000000000855f60 in mongo::MyMessageHandler::process (this=<optimized out>, m=..., port=0x7fffeb4b6100, le=0x7fffe4022060) at src/mongo/db/db.cpp:195
#30 0x0000000000f4b421 in mongo::PortMessageServer::handleIncomingMsg (arg=0x7ffff5c69370) at src/mongo/util/net/message_server_port.cpp:234
#31 0x00007ffff6b52fa8 in start_thread () from /usr/local/fbcode/gcc-4.8.1-glibc-2.17/lib/libpthread.so.0
#32 0x00007ffff68755ad in clone () from /usr/local/fbcode/gcc-4.8.1-glibc-2.17/lib/libc.so.6

Comment by Mark Callaghan [ 05/Jan/15 ]

The error is more likely with zlib compression and using a smaller cache (--wiredTigerCacheSizeGB 4) might also make it more likely. The iibench client code does not set the _id field in documents. In a test I just started, there are 4 exceptions 68 seconds into the test.

All my tests run with 10 client threads via: export NUM_LOADER_THREADS=10

Generated at Thu Feb 08 03:42:02 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.