[SERVER-14666] mongoimport skips records when _id field is custom-populated Created: 23/Jul/14 Updated: 10/Dec/14 Resolved: 24/Jul/14 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Tools |
| Affects Version/s: | 2.4.10 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Clark Freifeld | Assignee: | Ramon Fernandez Marina |
| Resolution: | Cannot Reproduce | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Operating System: | ALL |
| Steps To Reproduce: | Unfortunately, I don't have a sample dataset on me right now that I can share publicly. If this bug is not completely shot down, I will generate one and attach it to this bug report. To reproduce, try importing a large file of json records with _id populated into an empty collection, e.g. /usr/bin/mongoimport -d mydb -c mycollection --stopOnError --file largefile.json Then go into the db and count the records in the collection and see if it matches the number of records in the file. |
| Participants: |
| Description |
|
The mongoimport tool, when applied to a large file of json records (say 200k+ records), where the _id field is populated with a user-defined (unique) value, appears to arbitrarily skip records, without reporting any error. I'm not the only one to have experienced this issue: http://wi2ki.blogspot.com/2012/11/mongodb-user-mongoimport-did-not-load.html Apparently the import works correctly if the _id field is not populated and mongo is left to generate the _id by itself. I suspect this is the bug because I have a script now that goes through the original imported json file and checks each _id against the db. If it's missing, it dumps the record into a new file, then imports that file. That works fine, presumably due to the smaller number of records in the new file. It appears to be skipping on the order of 1% of the records in the original file, so a few thousand out of a few hundred thousand. I'm marking this as Major. Though there is a workaround, this issue caused us no end of headaches around data integrity, which is pretty important to most projects. |
| Comments |
| Comment by Ramon Fernandez Marina [ 24/Jul/14 ] | ||||||||||||||||||||||||||||||
|
clarkfreifeld, I'm not able to reproduce this problem. Since you didn't provide a dataset I first created one and exported it to a file:
Then I imported this dataset with mongoimport as you describe:
Have you checked that your _id fields are indeed unique? Because if they're not, chances are you're running into I'm going to resolve this as "Can't reproduce", but if someone can upload a dataset that reproduces the problem I'll be happy to re-open the ticket and investigate further. Regards, |