Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-18213

Lots of WriteConflict during multi-upsert with WiredTiger storage engine

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Critical - P2
    • Resolution: Fixed
    • Affects Version/s: 3.0.2, 3.0.3, 3.1.3
    • Fix Version/s: 3.0.4, 3.1.4
    • Component/s: WiredTiger
    • Labels:
    • Backwards Compatibility:
      Fully Compatible
    • Operating System:
      Linux
    • Backport Completed:
    • Sprint:
      Quint Iteration 4

      Description

      Issue Status as of Jun 09, 2015

      ISSUE SUMMARY
      MongoDB with WiredTiger may produce WriteConflict exceptions under the following conditions:

      • when doing bulk or multi updates, and
      • these updates have upsert=true set, and
      • no documents match the update condition, and
      • these operations use a unique index on the updated field

      USER IMPACT
      These exceptions cause the insert part of the write operation to fail and an error is returned to the application.

      WORKAROUNDS
      Retry failed write operations as inserts at the application level.

      AFFECTED VERSIONS
      MongoDB 3.0.0 to 3.0.3 with WiredTiger.

      FIX VERSION
      The fix is included in the 3.0.4 production release.

      Original description

      Intro

      We have a pretty complex python application running over Apache Storm 0.9.2, MongoEngine 0.8.7, PyMongo 2.8, and MongoDB 3.0.2. Our Mongo setup is a sharding cluster (3 mongos, 3 mongo-config and 5 ReplicaSets) and all ReplicaSets are running with WiredTiger.

      With a very high load of concurrent updates, I get a lot of exceptions at application level:

      OperationError: Update failed (exception: WriteConflict)

      And looking into the mongod logs:

      2015-04-22T16:24:02.100+0200 I WRITE    [conn81979] Had WriteConflict during multi update, aborting
      2015-04-22T16:24:04.907+0200 I WRITE    [conn75303] Had WriteConflict during multi update, aborting
      2015-04-22T16:24:18.538+0200 I WRITE    [conn81979] Had WriteConflict during multi update, aborting
      2015-04-22T16:24:27.539+0200 I WRITE    [conn75526] Had WriteConflict during multi update, aborting
      2015-04-22T16:24:34.413+0200 I WRITE    [conn81979] Had WriteConflict during multi update, aborting
      2015-04-22T16:24:34.455+0200 I WRITE    [conn77036] Had WriteConflict during multi update, aborting
      2015-04-22T16:24:46.693+0200 I WRITE    [conn81979] Had WriteConflict during multi update, aborting
      

      I've mitigated the issue with a try/except during the update operation and re-doing it in the except. Seems to work fine almost all the time, but sometimes it fails twice in a row.

      I've asked this same issue in StackOverflow (http://stackoverflow.com/questions/29800823/mongodb-with-wiredtiger-lots-of-writeconflict) and in the mongodb-user mailing list. There, Asya (thanks!) asked me to open this ticket.

      The update operation that fails with WriteConflict

      With MongoEngine we hace created a model with a custom QuerySet that simply does a bunch of:

      self.update(inc__valueX=CONSTANT * ammount, upsert=True)
      

      Where ammount is 99% of the time 1.0.

      What the mongo server really does (what this traduces to) is:

      db.collection.update({ s: ObjectId('53cd0cdac2f4ed5f76527f9a'), u: ObjectId('553df07f996d7110b3a2843b') }, { $inc: { value: VALUE } }, multi: true, upsert: true })
      

      Where VALUE is the result of (CONSTANT * ammount).

      The

      db.collection.stats()

      are attached.

      Current setup

      Currently we have 3 mongos, 3 mongo-config (running on the same servers as the mongos) and 5 replicaSet, each one with 3 members. Mongo-config are running with MMAPv1 and replicaSets with WiredTiger. All of them are running this MongoDB version:

      2015-04-10T10:34:06.213+0200 I CONTROL  [initandlisten] MongoDB starting : pid=21243 port=27017 dbpath=/opt/mongodb 64-bit host=bb07
      2015-04-10T10:34:06.213+0200 I CONTROL  [initandlisten] db version v3.0.2
      2015-04-10T10:34:06.213+0200 I CONTROL  [initandlisten] git version: 6201872043ecbbc0a4cc169b5482dcf385fc464f
      2015-04-10T10:34:06.213+0200 I CONTROL  [initandlisten] OpenSSL version: OpenSSL 1.0.1f 6 Jan 2014
      2015-04-10T10:34:06.213+0200 I CONTROL  [initandlisten] build info: Linux ip-10-229-1-2 3.13.0-24-generic #46-Ubuntu SMP Thu Apr 10 19:11:08 UTC 2014 x86_64 BOOST_LIB_	VERSION=1_49
      2015-04-10T10:34:06.213+0200 I CONTROL  [initandlisten] allocator: tcmalloc
      2015-04-10T10:34:06.213+0200 I CONTROL  [initandlisten] options: { config: "/etc/mongod.conf", net: { http: { enabled: false } }, replication: { replSet: "rs01" }, storage: { dbPath: "/opt/mongodb", engine: "wiredTiger" }, systemLog: { destination: "file", logAppend: true, path: "/var/log/mongodb/mongod.log" } }
      

      > mongos --version
      MongoS version 3.0.2 starting: pid=19084 port=27017 64-bit host=bb01 (--help for usage)
      git version: 6201872043ecbbc0a4cc169b5482dcf385fc464f
      OpenSSL version: OpenSSL 1.0.1f 6 Jan 2014
      build sys info: Linux ip-10-229-1-2 3.13.0-24-generic #46-Ubuntu SMP Thu Apr 10 19:11:08 UTC 2014 x86_64 BOOST_LIB_VERSION=1_49
      

      The Apache Storm machines (the ones performing the update() operations, run on different servers and same OS version. The whole setup is running on top of VMware 5.5.

      Logs

      After running a

      db.adminCommand({setParameter:1, logLevel:5})

      in one of the primarys and waiting for the error to reproduce, this is the log output:

      2015-04-27T10:17:03.359+0200 D COMMAND  [conn63749] run command database.$cmd { update: "collection", upda
      tes: [ { q: { s: ObjectId('53cd0cdac2f4ed5f76527f9a'), u: ObjectId('553df07f996d7110b3a2843b') }
      , u: { $inc: { value: 0.0328 } }, multi: true, upsert: true } ], ordered: true, metadata: { shardName: "b
      brs03", shardVersion: [ Timestamp 856000|4, ObjectId('552f8fbfeaa225ff8d70a7dd') ], session: 0 } }
      2015-04-27T10:17:03.359+0200 D QUERY    [conn63749] [QLOG] Tagging the match expression according to cach
      e data: 
      Filter:
      $and
          s == ObjectId('53cd0cdac2f4ed5f76527f9a')
          u == ObjectId('553df07f996d7110b3a2843b')
      Cache data:
      (index-tagged expression tree: tree=Node
      ---Leaf { u: 1, s: 1 }, pos: 1
      ---Leaf { u: 1, s: 1 }, pos: 0
      )
      2015-04-27T10:17:03.359+0200 D QUERY    [conn63749] [QLOG] Index 0: { _id: 1 }
      2015-04-27T10:17:03.359+0200 D QUERY    [conn63749] [QLOG] Index 1: { u: 1, s: 1 }
      2015-04-27T10:17:03.359+0200 D QUERY    [conn63749] [QLOG] Index 2: { u: 1, valueX: -1 }
      2015-04-27T10:17:03.359+0200 D QUERY    [conn63749] [QLOG] Index 3: { u: 1, valueY: -1 }
      2015-04-27T10:17:03.359+0200 D QUERY    [conn63749] [QLOG] Index 4: { u: 1, value: -1 }
      2015-04-27T10:17:03.359+0200 D QUERY    [conn63749] [QLOG] Index 5: { u: 1, valueZ: -1 }
      2015-04-27T10:17:03.359+0200 D QUERY    [conn63749] [QLOG] Tagged tree:
      $and
          u == ObjectId('553df07f996d7110b3a2843b')  || Selected Index #1 pos 0
          s == ObjectId('53cd0cdac2f4ed5f76527f9a')  || Selected Index #1 pos 1
      2015-04-27T10:17:03.359+0200 D QUERY    [conn63749] [QLOG] Planner: solution constructed from the cache:
      FETCH
      ---fetched = 1
      ---sortedByDiskLoc = 1
      ---getSort = [{}, { s: 1 }, { u: 1 }, { u: 1, s: 1 }, ]
      ---Child:
      ------IXSCAN
      ---------keyPattern = { u: 1, s: 1 }
      ---------direction = 1
      ---------bounds = field #0['u']: [ObjectId('553df07f996d7110b3a2843b'), ObjectId('553df07f996d7110b3a2843b')], field #1['s']: [ObjectId('53cd0cdac2f4ed5f76527f9a'), ObjectId('53cd0cdac2f4ed5f76527f9a')]
      ---------fetched = 0
      ---------sortedByDiskLoc = 1
      ---------getSort = [{}, { s: 1 }, { u: 1 }, { u: 1, s: 1 }, ]
       
      2015-04-27T10:17:03.359+0200 D STORAGE  [conn63749] WT begin_transaction
      2015-04-27T10:17:03.359+0200 D WRITE    [conn63749] update validate options --  updatedFields: Fields:[ value,] immutableAndSingleValueFields.size:1 fromRepl: 0 validate:1
      2015-04-27T10:17:03.359+0200 D STORAGE  [conn63749] WT rollback_transaction
      2015-04-27T10:17:03.359+0200 D WRITE    [conn63749] Caught WriteConflictException doing update on database.collection, attempt: 2 retrying
      2015-04-27T10:17:03.359+0200 D STORAGE  [conn63749] WT begin_transaction
      2015-04-27T10:17:03.359+0200 D WRITE    [conn63749] update validate options --  updatedFields: Fields:[ value,] immutableAndSingleValueFields.size:1 fromRepl: 0 validate:1
      2015-04-27T10:17:03.359+0200 D STORAGE  [conn63749] WT rollback_transaction
      2015-04-27T10:17:03.359+0200 D WRITE    [conn63749] Caught WriteConflictException doing update on database.collection, attempt: 3 retrying
      ...
      2015-04-27T10:17:03.359+0200 D STORAGE  [conn63749] WT begin_transaction
      2015-04-27T10:17:03.360+0200 D WRITE    [conn63749] update validate options --  updatedFields: Fields:[ value,] immutableAndSingleValueFields.size:1 fromRepl: 0 validate:1
      2015-04-27T10:17:03.360+0200 D STORAGE  [conn63749] WT rollback_transaction
      2015-04-27T10:17:03.360+0200 D WRITE    [conn63749] Caught WriteConflictException doing update on database.collection, attempt: 4 retrying
      ...
      

      Final note

      I modified some names (database, collections, fields or indexes). If you want, I can upload full logs or configuration (via SCP) with all the original names. If you require so, maybe we could even plan a full restart of all the processes (that would require planification).

      1. dbstats.txt
        28 kB
        Alex Sobrino
      2. mongodb.conf
        0.7 kB
        Julien Thebault
      3. mongod-log.txt
        263 kB
        Alex Sobrino

        Issue Links

          Activity

          Hide
          asya Asya Kamsky added a comment -

          Alex,

          The following run in mongo shell will usually reproduce it:

          db.foo.drop();
          db.foo.ensureIndex({x:1},{unique:true});
          for (i=0; i<10000; i++) { db.foo.insert({x:i, y:0}); }
          db.foo.remove({});
          op={ op: "update", ns: "test.foo", query: { y: { '#RAND_INT': [0, 5 ] } }, update:{ $set: { x: { '#RAND_INT': [0, 10 ] }} }, multi: true, upsert: true, writeCmd: true, safe: false }
          benchRun( { parallel: 16, seconds: 16, ops:[op], host:db._mongo.host } )

          You may need to use a different value for number of threads (parallel) depending on your number of available cores.

          This uses the built-in 'benchRun' utility.

          Asya

          Show
          asya Asya Kamsky added a comment - Alex, The following run in mongo shell will usually reproduce it: db.foo.drop(); db.foo.ensureIndex({x:1},{unique:true}); for (i=0; i<10000; i++) { db.foo.insert({x:i, y:0}); } db.foo.remove({}); op={ op: "update", ns: "test.foo", query: { y: { '#RAND_INT': [0, 5 ] } }, update:{ $set: { x: { '#RAND_INT': [0, 10 ] }} }, multi: true, upsert: true, writeCmd: true, safe: false } benchRun( { parallel: 16, seconds: 16, ops:[op], host:db._mongo.host } ) You may need to use a different value for number of threads (parallel) depending on your number of available cores. This uses the built-in 'benchRun' utility. Asya
          Hide
          alexsobrino Alex Sobrino added a comment -

          Thanks Asya Kamsky

          Show
          alexsobrino Alex Sobrino added a comment - Thanks Asya Kamsky
          Hide
          xgen-internal-githook Githook User added a comment -

          Author:

          {u'username': u'GeertBosch', u'name': u'Geert Bosch', u'email': u'geert@mongodb.com'}

          Message: SERVER-18213: Retry insert in case of a write conflict during multi-upsert
          Branch: master
          https://github.com/mongodb/mongo/commit/054c874f25a91121ed16cedbc44b7d4b919a430b

          Show
          xgen-internal-githook Githook User added a comment - Author: {u'username': u'GeertBosch', u'name': u'Geert Bosch', u'email': u'geert@mongodb.com'} Message: SERVER-18213 : Retry insert in case of a write conflict during multi-upsert Branch: master https://github.com/mongodb/mongo/commit/054c874f25a91121ed16cedbc44b7d4b919a430b
          Hide
          julien.thebault@1000mercis.com Julien Thebault added a comment -

          Thanks for your reactivity !

          Show
          julien.thebault@1000mercis.com Julien Thebault added a comment - Thanks for your reactivity !
          Hide
          xgen-internal-githook Githook User added a comment -

          Author:

          {u'username': u'GeertBosch', u'name': u'Geert Bosch', u'email': u'geert@mongodb.com'}

          Message: SERVER-18213: Retry insert in case of a write conflict during multi-upsert

          (cherry picked from commit 054c874f25a91121ed16cedbc44b7d4b919a430b)

          Conflicts:
          src/mongo/db/exec/update.cpp
          Branch: v3.0
          https://github.com/mongodb/mongo/commit/bfdf548bc4c882ab944e0cf2f1e6015752fc05c2

          Show
          xgen-internal-githook Githook User added a comment - Author: {u'username': u'GeertBosch', u'name': u'Geert Bosch', u'email': u'geert@mongodb.com'} Message: SERVER-18213 : Retry insert in case of a write conflict during multi-upsert (cherry picked from commit 054c874f25a91121ed16cedbc44b7d4b919a430b) Conflicts: src/mongo/db/exec/update.cpp Branch: v3.0 https://github.com/mongodb/mongo/commit/bfdf548bc4c882ab944e0cf2f1e6015752fc05c2

            People

            • Votes:
              7 Vote for this issue
              Watchers:
              27 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                  Agile