[SERVER-20158] Errors during findAndModify runs on Mongo2.4.3 when both upsert and new are true Created: 27/Aug/15  Updated: 06/Jan/16  Resolved: 10/Oct/15

Status: Closed
Project: Core Server
Component/s: Querying, Sharding, Write Ops
Affects Version/s: 2.4.3
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Lucifinil Long Assignee: Unassigned
Resolution: Done Votes: 0
Labels: findAndModify
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

CentOS6.2 with 24 CPU
mongos started with numactl --interleave=all


Issue Links:
Duplicate
duplicates SERVER-14322 Retry on predicate unique index viola... Closed
Related
related to DOCS-861 Clarify behavior of findAndModify wit... Closed
Operating System: ALL
Participants:

 Description   

Hi, guys
We’re using a mongos instance that manages 2 shards with Mongo version 2.4.3 as our DB server. There is a collection ‘latest_id’ in db ‘msg_id’ on mongos, shard key is ‘jid’ field with hash index, and there is a unique index on ‘jid’ field also.
We use bellow command to make an increased counter:

{findAndModify:"latest_id", query: {jid:"6003571078"}, update: {$inc:{latest_im:2}, $set:{latest_modified:new Date()}}, new: true, upsert:true}

It works fine most time, however, we have encountered two issues during pressure test:

Issue #1 (it seems a known issue):

findAndModify command doesn’t return any error information if it get error in insert/update phase, and it always return latest record if new is set! DOCS-861 doesn’t describe how to identify a fAndM fails due to unique index constraint violation, etc.
This causes that two threads might hold same id and one will be failed for duplicate error.

Issue #2:

Sometimes findAndModify command doesn’t update existing document in collection ‘latest_id’ but insert a new document instead of update it, and mongos only can query the document inserted last, however all documents actually are existed in different shards.
I have printed the logs below:
Fist insert:

2015-08-26 18:34:12: storeIMMsg: fAndM's return value: [{ value: { _id: ObjectId('55dd96247879652ab5c33fc1'), jid: "6004697765", latest_im: 2, latest_modified: new Date(1440585252063) }, lastErrorObject: { updatedExisting: false, n: 1, upserted: ObjectId('55dd96247879652ab5c33fc1') }, ok: 1.0 }]

Second insert:

2015-08-26 18:47:37: storeIMMsg: fAndM's return value: [{ value: { _id: ObjectId('55dd9949b1edec4d664e35c7'), jid: "6004697765", latest_im: 2, latest_modified: new Date(1440586057285) }, lastErrorObject: { updatedExisting: false, n: 1, upserted: ObjectId('55dd9949b1edec4d664e35c7') }, ok: 1.0 }]

I have checked the data via mongos and found that document with _id: ObjectId('55dd9949b1edec4d664e35c7'). I make sure that our code exactly doesn’t have any delete operation on ‘latest_id’ collection. I have connected to shards and found surprising situation: these two documents exists in db shards, each shard has one.

[longjun@dbag07 ~]$ /usr/local/mongodb/bin/mongo mdb03:16316/msg_id
MongoDB shell version: 2.4.3
connecting to: mdb03:16316/msg_id
txl_peer_rep12:PRIMARY> db.latest_id.find({jid:"6004697765"})
{ "_id" : ObjectId("55dd9949b1edec4d664e35c7"), "jid" : "6004697765", "latest_im" : 2, "latest_modified" : ISODate("2015-08-26T10:47:37.285Z") }
txl_peer_rep12:PRIMARY> exit
bye
[longjun@dbag07 ~]$ /usr/local/mongodb/bin/mongo mdb03:16311/msg_id
MongoDB shell version: 2.4.3
connecting to: mdb03:16311/msg_id
txl_peer_rep2:PRIMARY> db.latest_id.find({jid:"6004697765"})
{ "_id" : ObjectId("55dd96247879652ab5c33fc1"), "jid" : "6004697765", "latest_im" : 2, "latest_modified" : ISODate("2015-08-26T10:34:12.063Z") }

As my comprehension, because ‘jid’ is shard key field with unique index, a determinate ‘jid’ value should only has one document be stored in db and locate on one shard, even though mongo stored the document in wrong shard for performance, it still should move the data to correct shard in time to avoid this issue. However, mongo inserts two documents with same value of identifying and shard field to two shards. It seems mongo 2.4.3 has insert/find issue in sharding environment.

More information:
OS, CentOS6.2 with Mongo C++ driver

--- Sharding Status --- 
  sharding version: {
        "_id" : 1,
        "version" : 3,
        "minCompatibleVersion" : 3,
        "currentVersion" : 4,
        "clusterId" : ObjectId("55dd93a2814bc83d0c8eeb11")
}
  shards:
        {  "_id" : "txl_peer_sh12",  "host" : "txl_peer_rep12/mdb03:16316,mdb04:16317" }
        {  "_id" : "txl_peer_sh2",  "host" : "txl_peer_rep2/mdb03:16311,mdb04:16312" }
  databases:
        {  "_id" : "admin",  "partitioned" : false,  "primary" : "config" }
        {  "_id" : "msg_id",  "partitioned" : true,  "primary" : "txl_peer_sh2" }
                msg_id.latest_id
                        shard key: { "jid" : "hashed" }
                        chunks:
                                txl_peer_sh2    4
                                txl_peer_sh12   4
                        { "jid" : { "$minKey" : 1 } } -->> { "jid" : NumberLong("-7113991756202827270") } on : txl_peer_sh2 { "t" : 2, "i" : 8 } 
                        { "jid" : NumberLong("-7113991756202827270") } -->> { "jid" : NumberLong("-4611686018427387902") } on : txl_peer_sh2 { "t" : 2, "i" : 9 } 
                        { "jid" : NumberLong("-4611686018427387902") } -->> { "jid" : NumberLong("-2507537502000818829") } on : txl_peer_sh2 { "t" : 2, "i" : 10 } 
                        { "jid" : NumberLong("-2507537502000818829") } -->> { "jid" : NumberLong(0) } on : txl_peer_sh2 { "t" : 2, "i" : 11 } 
                        { "jid" : NumberLong(0) } -->> { "jid" : NumberLong("2095359270393918835") } on : txl_peer_sh12 { "t" : 2, "i" : 12 } 
                        { "jid" : NumberLong("2095359270393918835") } -->> { "jid" : NumberLong("4611686018427387902") } on : txl_peer_sh12 { "t" : 2, "i" : 13 } 
                        { "jid" : NumberLong("4611686018427387902") } -->> { "jid" : NumberLong("6735830371781635195") } on : txl_peer_sh12 { "t" : 2, "i" : 6 } 
                        { "jid" : NumberLong("6735830371781635195") } -->> { "jid" : { "$maxKey" : 1 } } on : txl_peer_sh12 { "t" : 2, "i" : 7 }



 Comments   
Comment by Ramon Fernandez Marina [ 10/Oct/15 ]

lucifinil, apologies for the long delay in getting back to you. The behavior you're observing is expected and documented both for update and for findAndModify (these are links to the 2.6 docs, which I think are clearer than the 2.4 version).

There's ongoing discussion on SERVER-14322 to consider modifying this behavior in the future, but so far there's no consensus. Feel free to comment on SERVER-14322 and watch it for updates.

Regards,
Ramón.

Generated at Thu Feb 08 03:53:20 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.