Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Cannot Reproduce
Priority: Major - P3
Fix Version/s: None
Affects Version/s: 2.4.10
Component/s: Tools
Labels:
None

Operating System:
ALL
Steps To Reproduce:

Hide

Unfortunately, I don't have a sample dataset on me right now that I can share publicly. If this bug is not completely shot down, I will generate one and attach it to this bug report.

To reproduce, try importing a large file of json records with _id populated into an empty collection, e.g.

/usr/bin/mongoimport -d mydb -c mycollection --stopOnError --file largefile.json

Then go into the db and count the records in the collection and see if it matches the number of records in the file.

Show
Unfortunately, I don't have a sample dataset on me right now that I can share publicly. If this bug is not completely shot down, I will generate one and attach it to this bug report. To reproduce, try importing a large file of json records with _id populated into an empty collection, e.g. /usr/bin/mongoimport -d mydb -c mycollection --stopOnError --file largefile.json Then go into the db and count the records in the collection and see if it matches the number of records in the file.
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

The mongoimport tool, when applied to a large file of json records (say 200k+ records), where the _id field is populated with a user-defined (unique) value, appears to arbitrarily skip records, without reporting any error.

I'm not the only one to have experienced this issue:

http://wi2ki.blogspot.com/2012/11/mongodb-user-mongoimport-did-not-load.html
http://wi2ki.blogspot.com/2012/11/mongodb-user-re-mongoimport-did-not.html

Apparently the import works correctly if the _id field is not populated and mongo is left to generate the _id by itself.

I suspect this is the bug because I have a script now that goes through the original imported json file and checks each _id against the db. If it's missing, it dumps the record into a new file, then imports that file. That works fine, presumably due to the smaller number of records in the new file. It appears to be skipping on the order of 1% of the records in the original file, so a few thousand out of a few hundred thousand.

I'm marking this as Major. Though there is a workaround, this issue caused us no end of headaches around data integrity, which is pretty important to most projects.

Assignee:: Ramon Fernandez
Reporter:: Clark Freifeld
Participants:: Clark Freifeld, Ramon Fernandez
Votes:: 0 Vote for this issue
Watchers:: 3 Start watching this issue

Created:: Jul 23 2014 09:45:33 PM UTC
Updated:: Dec 10 2014 11:18:53 PM UTC
Resolved:: Jul 24 2014 02:46:23 PM UTC

Details

Description

Attachments

Activity

People

Dates