[SERVER-20634] Mongo 3.0.5 might lose update operation on record if update operations occur during shard data migration Created: 25/Sep/15  Updated: 10/Nov/15  Resolved: 10/Nov/15

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 3.0.5
Fix Version/s: None

Type: Bug Priority: Critical - P2
Reporter: Lucifinil Long Assignee: Unassigned
Resolution: Duplicate Votes: 0
Labels: sharding
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

A mongos manages two shards, version 3.0.5. Replica set consist of 3 mongod instances, one of them is arbiter. Both mongod instances started with --storageEngine wiredTiger parameter.


Issue Links:
Duplicate
duplicates SERVER-20407 findAndModify on mongoS upserts to th... Closed
Operating System: Linux
Steps To Reproduce:

Use below command to update record with about 50000 requests per second:
{findAndModify:"im_latest_id", query:

{jid:"xxxxxxxxxxxx"}

, update: {$inc:

{latest:2}

, $set:{latest_modified:new Date()}}, new: true }
if the record of owner isn't existed, insert it.
im_latest_id collection's shard key is ‘jid’ field with hash index, and there is a unique index on ‘jid’ field also.

Participants:

 Description   

Hi, guys
We found that Mongo 3.0.5 might lose update operation on record if update operations occur during shard data migration.
Environment:
A mongos manages two shards, version 3.0.5. Replica set consist of 3 mongod instances, one of them is arbiter. Both mongod instances started with --storageEngine wiredTiger parameter.

We’re using a mongos instance that manages 2 shards with Mongo version 2.4.3 as our DB server. There is a collection ‘latest_id’ in db ‘msg_id’ on mongos, shard key is ‘jid’ field with hash index, and there is a unique index on ‘jid’ field also.
We use bellow command to make an increased counter:

{findAndModify:"im_latest_id", query: {jid:"6019806037"}, update: {$inc:{latest:2}, $set:{latest_modified:new Date()}}, new: true }

It works fine most time, however, we have encountered an issue during pressure test: A successful update operation on “latest” field has lost!
From our debug log, we got bellow update operation logs:

2015-09-21 14:35:40 [96] [INFO] 139687556163328: getLastId: updated record for owner [6019806037], fAndM command returned value: [{ value: { _id: ObjectId('55ffa53ce64830c63e9297ff'), jid: "6019806037", latest: 2, latest_modified: new Date(1442817340484) }, lastErrorObject: { updatedExisting: true, n: 1 }, ok: 1.0, $gleStats: { lastOpTime: Timestamp 1442817340|400, electionId: ObjectId('55ff7e55621c89462342f3ea') } }]
2015-09-21 14:36:20 [96] [INFO] 140236372915968: getLastId: updated record for owner [6019806037], fAndM command returned value: [{ value: { _id: ObjectId('55ffa53ce64830c63e9297ff'), jid: "6019806037", latest: 4, latest_modified: new Date(1442817380253) }, lastErrorObject: { updatedExisting: true, n: 1 }, ok: 1.0, $gleStats: { lastOpTime: Timestamp 1442817380|631, electionId: ObjectId('55ff7e55621c89462342f3ea') } }]

Above two lines show that two findAndModify operation were successful and new data have been returned, however, from mongo shell, the data is below:

mongos> db.im_latest_id.find({jid: "6019806037"})
{ "_id" : ObjectId("55ffa53ce64830c63e9297ff"), "jid" : "6019806037", "latest" : 2, "latest_modified" : ISODate("2015-09-21T06:35:40.484Z") }

Therefore, I also checked the data change history from local db in shards:
/usr/local/mongodb30/bin/mongo localhost:6311/local

rep9:PRIMARY> db.oplog.rs.find({$or:[{"o._id":ObjectId('55ffa53ce64830c63e9297ff')}, {"o2._id":ObjectId('55ffa53ce64830c63e9297ff')}]})
{ "ts" : Timestamp(1442817354, 1379), "h" : NumberLong("1160217985212942599"), "v" : 2, "op" : "i", "ns" : "msg_imid.im_latest_id", "fromMigrate" : true, "o" : { "_id" : ObjectId("55ffa53ce64830c63e9297ff"), "jid" : "6019806037", "latest" : 2, "latest_modified" : ISODate("2015-09-21T06:35:40.484Z") } }

/usr/local/mongodb30/bin/mongo localhost:6316/local

rep19:PRIMARY> db.oplog.rs.find({$or:[{"o._id":ObjectId('55ffa53ce64830c63e9297ff')}, {"o2._id":ObjectId('55ffa53ce64830c63e9297ff')}]})
{ "ts" : Timestamp(1442817340, 398), "h" : NumberLong("8374787035435596051"), "v" : 2, "op" : "i", "ns" : "msg_imid.im_latest_id", "o" : { "_id" : ObjectId("55ffa53ce64830c63e9297ff"), "jid" : "6019806037", "latest" : 0, "latest_modified" : ISODate("2015-09-21T06:35:40.484Z") } }
{ "ts" : Timestamp(1442817340, 400), "h" : NumberLong("-7736351314285439896"), "v" : 2, "op" : "u", "ns" : "msg_imid.im_latest_id", "o2" : { "_id" : ObjectId("55ffa53ce64830c63e9297ff") }, "o" : { "$set" : { "latest" : 2 } } }
{ "ts" : Timestamp(1442817380, 631), "h" : NumberLong("6375165799258425064"), "v" : 2, "op" : "u", "ns" : "msg_imid.im_latest_id", "o2" : { "_id" : ObjectId("55ffa53ce64830c63e9297ff") }, "o" : { "$set" : { "latest" : 4, "latest_modified" : ISODate("2015-09-21T06:36:20.253Z") } } }
{ "ts" : Timestamp(1442817391, 1797), "h" : NumberLong("-9215893867258519166"), "v" : 2, "op" : "d", "ns" : "msg_imid.im_latest_id", "fromMigrate" : true, "o" : { "_id" : ObjectId("55ffa53ce64830c63e9297ff") } }

From above change history, we can see that the record has been copied from replica set “rep19” to “rep9” at Timestamp(1442817354, 1379); then a new update operation still has occurred on replica set “rep19” at Timestamp(1442817380, 631); finally, replica set “rep19” performed a delete operation at Timestamp(1442817391, 1797) to delete the updated record.



 Comments   
Comment by Ramon Fernandez Marina [ 10/Nov/15 ]

lucifinil, as you can see in SERVER-20407 the fix was backported to the v3.0 branch, and will be part of 3.0.8 which is scheduled for the coming weeks. If this is a pressing issue for you you may one to consider two options: one is to build the existing v3.0 branch from sources; a better alternative is to download the 3.2.0-rc2 release candidate, which also contains the fix and will evolve into 3.2.0 in the same time frame.

Regards,
Ramón.

Comment by Lucifinil Long [ 10/Nov/15 ]

Hi, Ramon Fernandez
Thanks a lot. It seems that this is same to SERVER-20407, please duplicate this to SERVER-20407.
I am going to waiting the fixed version Mongo. Thanks a lot.

Comment by Ramon Fernandez Marina [ 22/Oct/15 ]

lucifinil, apologies for the delay in getting back to you. You may be running into SERVER-20407, which was recently fixed in the development version and where a backport to v3.0 is being evaluated.

Comment by Lucifinil Long [ 10/Oct/15 ]

More information: I did not set any write concern for db during I was deploying the Mongo environment.
db.getWriteConcern() returned null, just FYI.

Comment by Lucifinil Long [ 09/Oct/15 ]

Hi, Ramon Fernandez
Sorry for my mistake. all versions are Mongo 3.0.5

You can send email to longjun@360.cn if need, I will response it as soon as possible. Thanks a lot.

Comment by Ramon Fernandez Marina [ 25/Sep/15 ]

lucifinil, the description above reads:

A mongos manages two shards, version 3.0.5.

and then:

We’re using a mongos instance that manages 2 shards with Mongo version 2.4.3 as our DB server.

Can you please clarify which exact versions of mongod and mongos are you running on each node?

Thanks,
Ramón.

Generated at Thu Feb 08 03:54:48 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.