Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-18213

Lots of WriteConflict during multi-upsert with WiredTiger storage engine

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Critical - P2
    • Resolution: Fixed
    • Affects Version/s: 3.0.2, 3.0.3, 3.1.3
    • Fix Version/s: 3.0.4, 3.1.4
    • Component/s: WiredTiger
    • Labels:
    • Backwards Compatibility:
      Fully Compatible
    • Operating System:
      Linux
    • Backport Completed:
    • Sprint:
      Quint Iteration 4

      Description

      Issue Status as of Jun 09, 2015

      ISSUE SUMMARY
      MongoDB with WiredTiger may produce WriteConflict exceptions under the following conditions:

      • when doing bulk or multi updates, and
      • these updates have upsert=true set, and
      • no documents match the update condition, and
      • these operations use a unique index on the updated field

      USER IMPACT
      These exceptions cause the insert part of the write operation to fail and an error is returned to the application.

      WORKAROUNDS
      Retry failed write operations as inserts at the application level.

      AFFECTED VERSIONS
      MongoDB 3.0.0 to 3.0.3 with WiredTiger.

      FIX VERSION
      The fix is included in the 3.0.4 production release.

      Original description

      Intro

      We have a pretty complex python application running over Apache Storm 0.9.2, MongoEngine 0.8.7, PyMongo 2.8, and MongoDB 3.0.2. Our Mongo setup is a sharding cluster (3 mongos, 3 mongo-config and 5 ReplicaSets) and all ReplicaSets are running with WiredTiger.

      With a very high load of concurrent updates, I get a lot of exceptions at application level:

      OperationError: Update failed (exception: WriteConflict)

      And looking into the mongod logs:

      2015-04-22T16:24:02.100+0200 I WRITE    [conn81979] Had WriteConflict during multi update, aborting
      2015-04-22T16:24:04.907+0200 I WRITE    [conn75303] Had WriteConflict during multi update, aborting
      2015-04-22T16:24:18.538+0200 I WRITE    [conn81979] Had WriteConflict during multi update, aborting
      2015-04-22T16:24:27.539+0200 I WRITE    [conn75526] Had WriteConflict during multi update, aborting
      2015-04-22T16:24:34.413+0200 I WRITE    [conn81979] Had WriteConflict during multi update, aborting
      2015-04-22T16:24:34.455+0200 I WRITE    [conn77036] Had WriteConflict during multi update, aborting
      2015-04-22T16:24:46.693+0200 I WRITE    [conn81979] Had WriteConflict during multi update, aborting
      

      I've mitigated the issue with a try/except during the update operation and re-doing it in the except. Seems to work fine almost all the time, but sometimes it fails twice in a row.

      I've asked this same issue in StackOverflow (http://stackoverflow.com/questions/29800823/mongodb-with-wiredtiger-lots-of-writeconflict) and in the mongodb-user mailing list. There, Asya (thanks!) asked me to open this ticket.

      The update operation that fails with WriteConflict

      With MongoEngine we hace created a model with a custom QuerySet that simply does a bunch of:

      self.update(inc__valueX=CONSTANT * ammount, upsert=True)
      

      Where ammount is 99% of the time 1.0.

      What the mongo server really does (what this traduces to) is:

      db.collection.update({ s: ObjectId('53cd0cdac2f4ed5f76527f9a'), u: ObjectId('553df07f996d7110b3a2843b') }, { $inc: { value: VALUE } }, multi: true, upsert: true })
      

      Where VALUE is the result of (CONSTANT * ammount).

      The

      db.collection.stats()

      are attached.

      Current setup

      Currently we have 3 mongos, 3 mongo-config (running on the same servers as the mongos) and 5 replicaSet, each one with 3 members. Mongo-config are running with MMAPv1 and replicaSets with WiredTiger. All of them are running this MongoDB version:

      2015-04-10T10:34:06.213+0200 I CONTROL  [initandlisten] MongoDB starting : pid=21243 port=27017 dbpath=/opt/mongodb 64-bit host=bb07
      2015-04-10T10:34:06.213+0200 I CONTROL  [initandlisten] db version v3.0.2
      2015-04-10T10:34:06.213+0200 I CONTROL  [initandlisten] git version: 6201872043ecbbc0a4cc169b5482dcf385fc464f
      2015-04-10T10:34:06.213+0200 I CONTROL  [initandlisten] OpenSSL version: OpenSSL 1.0.1f 6 Jan 2014
      2015-04-10T10:34:06.213+0200 I CONTROL  [initandlisten] build info: Linux ip-10-229-1-2 3.13.0-24-generic #46-Ubuntu SMP Thu Apr 10 19:11:08 UTC 2014 x86_64 BOOST_LIB_	VERSION=1_49
      2015-04-10T10:34:06.213+0200 I CONTROL  [initandlisten] allocator: tcmalloc
      2015-04-10T10:34:06.213+0200 I CONTROL  [initandlisten] options: { config: "/etc/mongod.conf", net: { http: { enabled: false } }, replication: { replSet: "rs01" }, storage: { dbPath: "/opt/mongodb", engine: "wiredTiger" }, systemLog: { destination: "file", logAppend: true, path: "/var/log/mongodb/mongod.log" } }
      

      > mongos --version
      MongoS version 3.0.2 starting: pid=19084 port=27017 64-bit host=bb01 (--help for usage)
      git version: 6201872043ecbbc0a4cc169b5482dcf385fc464f
      OpenSSL version: OpenSSL 1.0.1f 6 Jan 2014
      build sys info: Linux ip-10-229-1-2 3.13.0-24-generic #46-Ubuntu SMP Thu Apr 10 19:11:08 UTC 2014 x86_64 BOOST_LIB_VERSION=1_49
      

      The Apache Storm machines (the ones performing the update() operations, run on different servers and same OS version. The whole setup is running on top of VMware 5.5.

      Logs

      After running a

      db.adminCommand({setParameter:1, logLevel:5})

      in one of the primarys and waiting for the error to reproduce, this is the log output:

      2015-04-27T10:17:03.359+0200 D COMMAND  [conn63749] run command database.$cmd { update: "collection", upda
      tes: [ { q: { s: ObjectId('53cd0cdac2f4ed5f76527f9a'), u: ObjectId('553df07f996d7110b3a2843b') }
      , u: { $inc: { value: 0.0328 } }, multi: true, upsert: true } ], ordered: true, metadata: { shardName: "b
      brs03", shardVersion: [ Timestamp 856000|4, ObjectId('552f8fbfeaa225ff8d70a7dd') ], session: 0 } }
      2015-04-27T10:17:03.359+0200 D QUERY    [conn63749] [QLOG] Tagging the match expression according to cach
      e data: 
      Filter:
      $and
          s == ObjectId('53cd0cdac2f4ed5f76527f9a')
          u == ObjectId('553df07f996d7110b3a2843b')
      Cache data:
      (index-tagged expression tree: tree=Node
      ---Leaf { u: 1, s: 1 }, pos: 1
      ---Leaf { u: 1, s: 1 }, pos: 0
      )
      2015-04-27T10:17:03.359+0200 D QUERY    [conn63749] [QLOG] Index 0: { _id: 1 }
      2015-04-27T10:17:03.359+0200 D QUERY    [conn63749] [QLOG] Index 1: { u: 1, s: 1 }
      2015-04-27T10:17:03.359+0200 D QUERY    [conn63749] [QLOG] Index 2: { u: 1, valueX: -1 }
      2015-04-27T10:17:03.359+0200 D QUERY    [conn63749] [QLOG] Index 3: { u: 1, valueY: -1 }
      2015-04-27T10:17:03.359+0200 D QUERY    [conn63749] [QLOG] Index 4: { u: 1, value: -1 }
      2015-04-27T10:17:03.359+0200 D QUERY    [conn63749] [QLOG] Index 5: { u: 1, valueZ: -1 }
      2015-04-27T10:17:03.359+0200 D QUERY    [conn63749] [QLOG] Tagged tree:
      $and
          u == ObjectId('553df07f996d7110b3a2843b')  || Selected Index #1 pos 0
          s == ObjectId('53cd0cdac2f4ed5f76527f9a')  || Selected Index #1 pos 1
      2015-04-27T10:17:03.359+0200 D QUERY    [conn63749] [QLOG] Planner: solution constructed from the cache:
      FETCH
      ---fetched = 1
      ---sortedByDiskLoc = 1
      ---getSort = [{}, { s: 1 }, { u: 1 }, { u: 1, s: 1 }, ]
      ---Child:
      ------IXSCAN
      ---------keyPattern = { u: 1, s: 1 }
      ---------direction = 1
      ---------bounds = field #0['u']: [ObjectId('553df07f996d7110b3a2843b'), ObjectId('553df07f996d7110b3a2843b')], field #1['s']: [ObjectId('53cd0cdac2f4ed5f76527f9a'), ObjectId('53cd0cdac2f4ed5f76527f9a')]
      ---------fetched = 0
      ---------sortedByDiskLoc = 1
      ---------getSort = [{}, { s: 1 }, { u: 1 }, { u: 1, s: 1 }, ]
       
      2015-04-27T10:17:03.359+0200 D STORAGE  [conn63749] WT begin_transaction
      2015-04-27T10:17:03.359+0200 D WRITE    [conn63749] update validate options --  updatedFields: Fields:[ value,] immutableAndSingleValueFields.size:1 fromRepl: 0 validate:1
      2015-04-27T10:17:03.359+0200 D STORAGE  [conn63749] WT rollback_transaction
      2015-04-27T10:17:03.359+0200 D WRITE    [conn63749] Caught WriteConflictException doing update on database.collection, attempt: 2 retrying
      2015-04-27T10:17:03.359+0200 D STORAGE  [conn63749] WT begin_transaction
      2015-04-27T10:17:03.359+0200 D WRITE    [conn63749] update validate options --  updatedFields: Fields:[ value,] immutableAndSingleValueFields.size:1 fromRepl: 0 validate:1
      2015-04-27T10:17:03.359+0200 D STORAGE  [conn63749] WT rollback_transaction
      2015-04-27T10:17:03.359+0200 D WRITE    [conn63749] Caught WriteConflictException doing update on database.collection, attempt: 3 retrying
      ...
      2015-04-27T10:17:03.359+0200 D STORAGE  [conn63749] WT begin_transaction
      2015-04-27T10:17:03.360+0200 D WRITE    [conn63749] update validate options --  updatedFields: Fields:[ value,] immutableAndSingleValueFields.size:1 fromRepl: 0 validate:1
      2015-04-27T10:17:03.360+0200 D STORAGE  [conn63749] WT rollback_transaction
      2015-04-27T10:17:03.360+0200 D WRITE    [conn63749] Caught WriteConflictException doing update on database.collection, attempt: 4 retrying
      ...
      

      Final note

      I modified some names (database, collections, fields or indexes). If you want, I can upload full logs or configuration (via SCP) with all the original names. If you require so, maybe we could even plan a full restart of all the processes (that would require planification).

        Attachments

        1. dbstats.txt
          28 kB
        2. mongodb.conf
          0.7 kB
        3. mongod-log.txt
          263 kB

          Issue Links

            Activity

              People

              • Votes:
                7 Vote for this issue
                Watchers:
                27 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: