[SERVER-17326] Data got removed for wrong date range Created: 19/Feb/15  Updated: 14/Apr/16  Resolved: 20/Feb/15

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 2.6.1
Fix Version/s: None

Type: Bug Priority: Critical - P2
Reporter: Himanshu Jain Assignee: Asya Kamsky
Resolution: Done Votes: 1
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: Text File shardA.log     Text File shardB.log    
Backwards Compatibility: Fully Compatible
Operating System: ALL
Participants:

 Description   

We have a sharded cluster with below config -

2 Sharded servers (shard-db1, shard-db2)
3 Config servers (on servers shard-db1, shard-db2 and app1)
2 MongoS servers (app1, app2)

and collection position is sharded across shard-db1 and shard-db2

Week back. a very strange thing happened:

As per line no, 379 (shardA.log) - a command was fired for deleting records for some range, as:

2015-02-11T04:56:35.564-0500 [conn52] remove ds-db.position query: { dt: { $gte: new Date(1412136000000), $lte: new Date(1424840400000) } } keyUpdates:0 numYields:0 locks(micros) w:169 234ms

But as per line - 603 (shardA.log), records were deleted for much larger date range, as

2015-02-11T05:27:15.556-0500 [conn43] remove ds-db.position query: { dt: { $gte: new Date(1364702400000), $lte: new Date(1424840400000) } } ndeleted:2375989 keyUpdates:0 numYields:37161 locks(micros) w:3561770402 2200618ms

Also, in ShardB.log
where this command took approx 90 mins.

2015-02-11T06:30:11.176-0500 [conn45] remove ds-db.position query: { dt: { $gte: new Date(1364702400000), $lte: new Date(1424840400000) } } ndeleted:5822749 keyUpdates:0 numYields:110266 locks(micros) w:6417752235 5976240ms

So no clue how does the delete command picked start date as: 1364702400000 instead of 1412136000000

Thanks



 Comments   
Comment by Asya Kamsky [ 20/Feb/15 ]

There is not two different outputs per command for this operation, the mongod only logs an operation when it finishes (and records how long the operation took, so you can determine when it was issued by subtracting duration from the timestamp of the log line). If you do the calculation, the command that finished at 5:27 was started around 4:51 (i.e. five minutes before the command at 4:56, which took less than a second).

Comment by Himanshu Jain [ 20/Feb/15 ]

Thanks for prompt reply.

So you mean to say two log statements in file - shardA.log (04:56 and 05:27) are outputs of two different commands.

Because I assumed 04:56 log says which command was issued
and 05:27 log says what was the output of command ?

If that's not the case, Is there also a log of command issued just like its there for command outputs ?

Thanks.

Comment by Asya Kamsky [ 19/Feb/15 ]

himanshu.j89@gmail.com the logs show two separate commands issues. First one was issued at 2015-02-11T04:51 or so and ran for 99 minutes on one shard and 37 minutes on the other, but the second command was issued at 2015-02-11T04:56 and that one finished quickly (because there were no matching documents by that time).

Generated at Thu Feb 08 03:44:02 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.