[SERVER-22958] Duplicate key error during mongorestore from v2.6 dump to v3.2 Created: 01/Mar/16  Updated: 18/Mar/16  Resolved: 18/Mar/16

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 2.6.12, 3.2.3
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Bijith Kumar Assignee: Kelsey Schubert
Resolution: Won't Fix Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Linux


Operating System: ALL
Participants:

 Description   

I am trying to mongorestore a dump (created by v2.6 mongodump) to v3.2 mongod and getting a lot of "E11000 duplicate key error". However, after the restore, the new collection seems to have all records.
The same dump works fine without error when restored to a v2.6 mongod.
The fact that it is a dump created by mongodump and it works without error when restored to a 2.6 mongod confirms that there are no duplicate records in the dump. Wondering if these are false errors

2016-03-01T00:56:47.260+0000 error: multiple errors in bulk operation:

  • E11000 duplicate key error collection: Reports.RestoredCollection index: id dup key: { : { T: new Date(1456358400000), AD: 130318, PB: 10264, CG: 307 }

    }

btw, I tried option --drop too just in case but didn't help as expected.



 Comments   
Comment by Kelsey Schubert [ 18/Mar/16 ]

Hi bkumar@nativo.net,

Thank you for uploading your data. I've seen that uploading your data to any MongoDB 2.6 or 3.2 using the 2.6 tools will not produce any duplicate key errors. However, as you have observed if you use the 3.2 tools these errors are produced on either MongoDB 2.6 or 3.2.

In summary, this issue is isolated to the 2.6 tools, and we have only been able to reproduce them using the data you provided. While, the errors are not logged, the data is inserted correctly and matches the end result of restoring with the 3.2 tools.

Since this issue has minimal impact and is only observed with an old version of tools that has since been rewritten, we will not be correcting this behavior. Looking forward, I would recommend using the 3.2 tools.

Thank you for your help investigating this issue,
Thomas

Comment by Bijith Kumar [ 11/Mar/16 ]

Hi Thomas,
I have uploaded the dump in secure portal. Please note it might contain sensitive information and should not be shared with anyone other than who are debugging this issue. Thanks

Comment by Kelsey Schubert [ 10/Mar/16 ]

Hi bkumar@nativo.net,

Can you please upload the dump that is causing this behavior? I have created a secure upload portal here.

Please note that files uploaded to this portal will only be visible to MongoDB employees investigating this issue and are routinely deleted after some time.

Thank you,
Thomas

Comment by Gabriel Russell (Inactive) [ 03/Mar/16 ]

It is a completely normal thing to have duplicate keys in dumps. This is because data can change while the dump os occurring. For this reason, we suggest using the --oplog and --oplogReplay feature of dump and restore as a way to get a point in time restore. But this doesn't help you with this problem.

It was suggested that the 2.6 database or tools may actually just not output the duplicate keys messages, but I've reproduced this, and found that the all combination of the 2.6 and 3.2 server and restore produce messages about duplicate keys.

I think that it's clear that what's happening is not happening in the tools ( dump or restore ) so I'm going to move this ticket over to the server project where a server team member can deduce why your 2.6 server is not emitting duplicate key errors.

Comment by Bijith Kumar [ 02/Mar/16 ]

Thanks Gabriel. I did more testing and I think the issue has something to do with active chunks and chunk skips. details at the end.
Before that, please find below the details you asked.

v3.2
=====
-mongorestore result
finished restoring Reports.AdPublicationGeoAgg (860982 documents)

-result from restored collection
> db.AdPublicationGeoAgg.count()
860929

  • error during restore (sample)
  • E11000 duplicate key error collection: Reports.AdPublicationGeoAgg index: id dup key: { : { T: new Date(1456444800000), AD: 130950, PB: 10529, CG: 178 }

    }

-result from bsondump
bsondump /backup/temp/2016-02-26/Reports/AdPublicationGeoAgg.bson > AdPublicationGeoAgg_26.json
2016-03-02T19:09:11.321+0000 860982 objects found

v2.6
====
-mongorestore result
860982 objects found

-result from restored collection
> db.AdPublicationGeoAgg.count()
860929

-result from bsondump
bsondump /backup/temp/2016-02-26/Reports/AdPublicationGeoAgg.bson > AdPublicationGeoAgg_26.json
860982 objects found

-result from original collection from which dump was created
mongos> db.AdPublicationGeoAgg.count({ "_id.T" : { "$gte" : new Date("2016-02-26T00:00:00.000Z") , "$lte" : new Date("2016-02-26T23:00:00.000Z")}})
860982

-result during dump
mongodump -h localhost -d 'Reports' --query '{ "_id.T" : { "$gte" : new Date(1456444800000) , "$lte" : new Date(1456531140000)}}' --out /tmp1/26/
860982 documents

From the test above, number of records in original collection and dump are same (860982).
However, the count in restored collections in both v2.6 and 3.2 doesn't match with that original count (both 2.6 and 3.2 collection have 860929 records)
diff = 860982-860929 = 53

I queried the original collection with one of the duplicate id from the error message and got back one document. However, the count query returned 2!.

mongos> db.AdPublicationGeoAgg.find(

{ "_id.T": new Date(1456444800000), "_id.AD": 130950, "_id.PB": 10529, "_id.CG": 178 }

)
{ "_id" :

{ "T" : ISODate("2016-02-26T00:00:00Z"), "AD" : 130950, "PB" : 10529, "CG" : 178 }

, "CN" : 1, "A" : 5513, "C" : 109284, "MP" : 1, "B" : 112710, "PN" : 1783, "P" : 4045, "d" : [

{ "CY" : 239, "RG" : 4223, "M" : 133, "B100" : 140000, "AT1" : 1, "AT46" : 1, "AT47" : 1, "S500" : 50 }

,

{ "CY" : 239, "RG" : 4224, "M" : 100, "AT46" : 1, "S500" : 50 }

,

{ "CY" : 239, "RG" : 4195, "M" : 88, "AT46" : 1, "S500" : 50 }

,

{ "CY" : 239, "RG" : 4232, "M" : 187, "AT47" : 1, "AT1" : 1, "AT46" : 1, "B100" : 140000, "S500" : 50 }

] }

mongos> db.AdPublicationGeoAgg.find(

{ "_id.T": new Date(1456444800000), "_id.AD": 130950, "_id.PB": 10529, "_id.CG": 178 }

).count()
2

mongos> db.AdPublicationGeoAgg.count(

{ "_id.T": new Date(1456444800000), "_id.AD": 130950, "_id.PB": 10529, "_id.CG": 178 }

)
2

I did an explain on find query and found that there is one chunk being skipped
"cursor" : "BtreeCursor _id.T_1_CN_1",
"n" : 1,
"nChunkSkips" : 1,
"nYields" : 272,
"nscanned" : 39364,
"nscannedAllPlans" : 39364,
"nscannedObjects" : 39364,
"nscannedObjectsAllPlans" : 39364,
"millisShardTotal" : 45,
"millisShardAvg" : 0,
"numQueries" : 96,
"numShards" : 96,
"millis" : 53

So it seems like mongodump (and count()) doesn't skip the active chunks which leads to duplicate records in dump file.

Comment by Gabriel Russell (Inactive) [ 02/Mar/16 ]

Bijith,

Thank you for this bug report.

Indeed this looks strange. The source of the duplicate key errors is in fact the server, so I'd be surprised if the errors were false. There are a few things that we could do to try to run this down.

First off, can you run bsondump on the bson for the collection file in the dump directory, something like:

bsondump dump/Reports/RestoredCollection.bson > RestoredCollection.json

it should output something like:

2016-03-02T11:11:09.602-0500    #### objects found

which should let you know how many documents are in the bson file.

Then can you count the freshly restored collection, with a shell command like:

Reports.RestoredCollection.count()

Then let me know the difference between these numbers.

Also, can you open that RestoredCollection.json and look for the documents with the indexes from the duplicate key errors, and see if there are more then one. Theoretically they may not be in proximity to each other in the file.

I have some other things to do to hunt this down, but I'd need to see the results of this work first.

Thank you,
Gabriel

Generated at Thu Feb 08 04:01:56 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.