Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-20158

Errors during findAndModify runs on Mongo2.4.3 when both upsert and new are true

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Done
    • Affects Version/s: 2.4.3
    • Fix Version/s: None
    • Component/s: Querying, Sharding, Write Ops
    • Labels:
    • Environment:
      CentOS6.2 with 24 CPU
      mongos started with numactl --interleave=all
    • Operating System:
      ALL

      Description

      Hi, guys
      We’re using a mongos instance that manages 2 shards with Mongo version 2.4.3 as our DB server. There is a collection ‘latest_id’ in db ‘msg_id’ on mongos, shard key is ‘jid’ field with hash index, and there is a unique index on ‘jid’ field also.
      We use bellow command to make an increased counter:

      {findAndModify:"latest_id", query: {jid:"6003571078"}, update: {$inc:{latest_im:2}, $set:{latest_modified:new Date()}}, new: true, upsert:true}
      

      It works fine most time, however, we have encountered two issues during pressure test:

      Issue #1 (it seems a known issue):

      findAndModify command doesn’t return any error information if it get error in insert/update phase, and it always return latest record if new is set! DOCS-861 doesn’t describe how to identify a fAndM fails due to unique index constraint violation, etc.
      This causes that two threads might hold same id and one will be failed for duplicate error.

      Issue #2:

      Sometimes findAndModify command doesn’t update existing document in collection ‘latest_id’ but insert a new document instead of update it, and mongos only can query the document inserted last, however all documents actually are existed in different shards.
      I have printed the logs below:
      Fist insert:

      2015-08-26 18:34:12: storeIMMsg: fAndM's return value: [{ value: { _id: ObjectId('55dd96247879652ab5c33fc1'), jid: "6004697765", latest_im: 2, latest_modified: new Date(1440585252063) }, lastErrorObject: { updatedExisting: false, n: 1, upserted: ObjectId('55dd96247879652ab5c33fc1') }, ok: 1.0 }]
      

      Second insert:

      2015-08-26 18:47:37: storeIMMsg: fAndM's return value: [{ value: { _id: ObjectId('55dd9949b1edec4d664e35c7'), jid: "6004697765", latest_im: 2, latest_modified: new Date(1440586057285) }, lastErrorObject: { updatedExisting: false, n: 1, upserted: ObjectId('55dd9949b1edec4d664e35c7') }, ok: 1.0 }]
      

      I have checked the data via mongos and found that document with _id: ObjectId('55dd9949b1edec4d664e35c7'). I make sure that our code exactly doesn’t have any delete operation on ‘latest_id’ collection. I have connected to shards and found surprising situation: these two documents exists in db shards, each shard has one.

      [longjun@dbag07 ~]$ /usr/local/mongodb/bin/mongo mdb03:16316/msg_id
      MongoDB shell version: 2.4.3
      connecting to: mdb03:16316/msg_id
      txl_peer_rep12:PRIMARY> db.latest_id.find({jid:"6004697765"})
      { "_id" : ObjectId("55dd9949b1edec4d664e35c7"), "jid" : "6004697765", "latest_im" : 2, "latest_modified" : ISODate("2015-08-26T10:47:37.285Z") }
      txl_peer_rep12:PRIMARY> exit
      bye
      [longjun@dbag07 ~]$ /usr/local/mongodb/bin/mongo mdb03:16311/msg_id
      MongoDB shell version: 2.4.3
      connecting to: mdb03:16311/msg_id
      txl_peer_rep2:PRIMARY> db.latest_id.find({jid:"6004697765"})
      { "_id" : ObjectId("55dd96247879652ab5c33fc1"), "jid" : "6004697765", "latest_im" : 2, "latest_modified" : ISODate("2015-08-26T10:34:12.063Z") }
      

      As my comprehension, because ‘jid’ is shard key field with unique index, a determinate ‘jid’ value should only has one document be stored in db and locate on one shard, even though mongo stored the document in wrong shard for performance, it still should move the data to correct shard in time to avoid this issue. However, mongo inserts two documents with same value of identifying and shard field to two shards. It seems mongo 2.4.3 has insert/find issue in sharding environment.

      More information:
      OS, CentOS6.2 with Mongo C++ driver

      --- Sharding Status --- 
        sharding version: {
              "_id" : 1,
              "version" : 3,
              "minCompatibleVersion" : 3,
              "currentVersion" : 4,
              "clusterId" : ObjectId("55dd93a2814bc83d0c8eeb11")
      }
        shards:
              {  "_id" : "txl_peer_sh12",  "host" : "txl_peer_rep12/mdb03:16316,mdb04:16317" }
              {  "_id" : "txl_peer_sh2",  "host" : "txl_peer_rep2/mdb03:16311,mdb04:16312" }
        databases:
              {  "_id" : "admin",  "partitioned" : false,  "primary" : "config" }
              {  "_id" : "msg_id",  "partitioned" : true,  "primary" : "txl_peer_sh2" }
                      msg_id.latest_id
                              shard key: { "jid" : "hashed" }
                              chunks:
                                      txl_peer_sh2    4
                                      txl_peer_sh12   4
                              { "jid" : { "$minKey" : 1 } } -->> { "jid" : NumberLong("-7113991756202827270") } on : txl_peer_sh2 { "t" : 2, "i" : 8 } 
                              { "jid" : NumberLong("-7113991756202827270") } -->> { "jid" : NumberLong("-4611686018427387902") } on : txl_peer_sh2 { "t" : 2, "i" : 9 } 
                              { "jid" : NumberLong("-4611686018427387902") } -->> { "jid" : NumberLong("-2507537502000818829") } on : txl_peer_sh2 { "t" : 2, "i" : 10 } 
                              { "jid" : NumberLong("-2507537502000818829") } -->> { "jid" : NumberLong(0) } on : txl_peer_sh2 { "t" : 2, "i" : 11 } 
                              { "jid" : NumberLong(0) } -->> { "jid" : NumberLong("2095359270393918835") } on : txl_peer_sh12 { "t" : 2, "i" : 12 } 
                              { "jid" : NumberLong("2095359270393918835") } -->> { "jid" : NumberLong("4611686018427387902") } on : txl_peer_sh12 { "t" : 2, "i" : 13 } 
                              { "jid" : NumberLong("4611686018427387902") } -->> { "jid" : NumberLong("6735830371781635195") } on : txl_peer_sh12 { "t" : 2, "i" : 6 } 
                              { "jid" : NumberLong("6735830371781635195") } -->> { "jid" : { "$maxKey" : 1 } } on : txl_peer_sh12 { "t" : 2, "i" : 7 }
      

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              Unassigned
              Reporter:
              lucifinil Lucifinil Long
              Participants:
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: