-
Type: Bug
-
Resolution: Done
-
Priority: Major - P3
-
None
-
Affects Version/s: 2.4.3
-
Environment:CentOS6.2 with 24 CPU
mongos started with numactl --interleave=all
-
ALL
Hi, guys
We’re using a mongos instance that manages 2 shards with Mongo version 2.4.3 as our DB server. There is a collection ‘latest_id’ in db ‘msg_id’ on mongos, shard key is ‘jid’ field with hash index, and there is a unique index on ‘jid’ field also.
We use bellow command to make an increased counter:
{findAndModify:"latest_id", query: {jid:"6003571078"}, update: {$inc:{latest_im:2}, $set:{latest_modified:new Date()}}, new: true, upsert:true}
It works fine most time, however, we have encountered two issues during pressure test:
Issue #1 (it seems a known issue):
findAndModify command doesn’t return any error information if it get error in insert/update phase, and it always return latest record if new is set! DOCS-861 doesn’t describe how to identify a fAndM fails due to unique index constraint violation, etc.
This causes that two threads might hold same id and one will be failed for duplicate error.
Issue #2:
Sometimes findAndModify command doesn’t update existing document in collection ‘latest_id’ but insert a new document instead of update it, and mongos only can query the document inserted last, however all documents actually are existed in different shards.
I have printed the logs below:
Fist insert:
2015-08-26 18:34:12: storeIMMsg: fAndM's return value: [{ value: { _id: ObjectId('55dd96247879652ab5c33fc1'), jid: "6004697765", latest_im: 2, latest_modified: new Date(1440585252063) }, lastErrorObject: { updatedExisting: false, n: 1, upserted: ObjectId('55dd96247879652ab5c33fc1') }, ok: 1.0 }]
Second insert:
2015-08-26 18:47:37: storeIMMsg: fAndM's return value: [{ value: { _id: ObjectId('55dd9949b1edec4d664e35c7'), jid: "6004697765", latest_im: 2, latest_modified: new Date(1440586057285) }, lastErrorObject: { updatedExisting: false, n: 1, upserted: ObjectId('55dd9949b1edec4d664e35c7') }, ok: 1.0 }]
I have checked the data via mongos and found that document with _id: ObjectId('55dd9949b1edec4d664e35c7'). I make sure that our code exactly doesn’t have any delete operation on ‘latest_id’ collection. I have connected to shards and found surprising situation: these two documents exists in db shards, each shard has one.
[longjun@dbag07 ~]$ /usr/local/mongodb/bin/mongo mdb03:16316/msg_id MongoDB shell version: 2.4.3 connecting to: mdb03:16316/msg_id txl_peer_rep12:PRIMARY> db.latest_id.find({jid:"6004697765"}) { "_id" : ObjectId("55dd9949b1edec4d664e35c7"), "jid" : "6004697765", "latest_im" : 2, "latest_modified" : ISODate("2015-08-26T10:47:37.285Z") } txl_peer_rep12:PRIMARY> exit bye [longjun@dbag07 ~]$ /usr/local/mongodb/bin/mongo mdb03:16311/msg_id MongoDB shell version: 2.4.3 connecting to: mdb03:16311/msg_id txl_peer_rep2:PRIMARY> db.latest_id.find({jid:"6004697765"}) { "_id" : ObjectId("55dd96247879652ab5c33fc1"), "jid" : "6004697765", "latest_im" : 2, "latest_modified" : ISODate("2015-08-26T10:34:12.063Z") }
As my comprehension, because ‘jid’ is shard key field with unique index, a determinate ‘jid’ value should only has one document be stored in db and locate on one shard, even though mongo stored the document in wrong shard for performance, it still should move the data to correct shard in time to avoid this issue. However, mongo inserts two documents with same value of identifying and shard field to two shards. It seems mongo 2.4.3 has insert/find issue in sharding environment.
More information:
OS, CentOS6.2 with Mongo C++ driver
--- Sharding Status --- sharding version: { "_id" : 1, "version" : 3, "minCompatibleVersion" : 3, "currentVersion" : 4, "clusterId" : ObjectId("55dd93a2814bc83d0c8eeb11") } shards: { "_id" : "txl_peer_sh12", "host" : "txl_peer_rep12/mdb03:16316,mdb04:16317" } { "_id" : "txl_peer_sh2", "host" : "txl_peer_rep2/mdb03:16311,mdb04:16312" } databases: { "_id" : "admin", "partitioned" : false, "primary" : "config" } { "_id" : "msg_id", "partitioned" : true, "primary" : "txl_peer_sh2" } msg_id.latest_id shard key: { "jid" : "hashed" } chunks: txl_peer_sh2 4 txl_peer_sh12 4 { "jid" : { "$minKey" : 1 } } -->> { "jid" : NumberLong("-7113991756202827270") } on : txl_peer_sh2 { "t" : 2, "i" : 8 } { "jid" : NumberLong("-7113991756202827270") } -->> { "jid" : NumberLong("-4611686018427387902") } on : txl_peer_sh2 { "t" : 2, "i" : 9 } { "jid" : NumberLong("-4611686018427387902") } -->> { "jid" : NumberLong("-2507537502000818829") } on : txl_peer_sh2 { "t" : 2, "i" : 10 } { "jid" : NumberLong("-2507537502000818829") } -->> { "jid" : NumberLong(0) } on : txl_peer_sh2 { "t" : 2, "i" : 11 } { "jid" : NumberLong(0) } -->> { "jid" : NumberLong("2095359270393918835") } on : txl_peer_sh12 { "t" : 2, "i" : 12 } { "jid" : NumberLong("2095359270393918835") } -->> { "jid" : NumberLong("4611686018427387902") } on : txl_peer_sh12 { "t" : 2, "i" : 13 } { "jid" : NumberLong("4611686018427387902") } -->> { "jid" : NumberLong("6735830371781635195") } on : txl_peer_sh12 { "t" : 2, "i" : 6 } { "jid" : NumberLong("6735830371781635195") } -->> { "jid" : { "$maxKey" : 1 } } on : txl_peer_sh12 { "t" : 2, "i" : 7 }
- duplicates
-
SERVER-14322 Retry on predicate unique index violations of update + upsert -> insert when possible
- Closed