Details
-
Bug
-
Resolution: Cannot Reproduce
-
Major - P3
-
None
-
2.4.10
-
None
-
ALL
-
Description
The mongoimport tool, when applied to a large file of json records (say 200k+ records), where the _id field is populated with a user-defined (unique) value, appears to arbitrarily skip records, without reporting any error.
I'm not the only one to have experienced this issue:
http://wi2ki.blogspot.com/2012/11/mongodb-user-mongoimport-did-not-load.html
http://wi2ki.blogspot.com/2012/11/mongodb-user-re-mongoimport-did-not.html
Apparently the import works correctly if the _id field is not populated and mongo is left to generate the _id by itself.
I suspect this is the bug because I have a script now that goes through the original imported json file and checks each _id against the db. If it's missing, it dumps the record into a new file, then imports that file. That works fine, presumably due to the smaller number of records in the new file. It appears to be skipping on the order of 1% of the records in the original file, so a few thousand out of a few hundred thousand.
I'm marking this as Major. Though there is a workaround, this issue caused us no end of headaches around data integrity, which is pretty important to most projects.