Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-14666

mongoimport skips records when _id field is custom-populated

    XMLWordPrintableJSON

Details

    • Icon: Bug Bug
    • Resolution: Cannot Reproduce
    • Icon: Major - P3 Major - P3
    • None
    • 2.4.10
    • Tools
    • None
    • ALL
    • Hide

      Unfortunately, I don't have a sample dataset on me right now that I can share publicly. If this bug is not completely shot down, I will generate one and attach it to this bug report.

      To reproduce, try importing a large file of json records with _id populated into an empty collection, e.g.

      /usr/bin/mongoimport -d mydb -c mycollection --stopOnError --file largefile.json

      Then go into the db and count the records in the collection and see if it matches the number of records in the file.

      Show
      Unfortunately, I don't have a sample dataset on me right now that I can share publicly. If this bug is not completely shot down, I will generate one and attach it to this bug report. To reproduce, try importing a large file of json records with _id populated into an empty collection, e.g. /usr/bin/mongoimport -d mydb -c mycollection --stopOnError --file largefile.json Then go into the db and count the records in the collection and see if it matches the number of records in the file.

    Description

      The mongoimport tool, when applied to a large file of json records (say 200k+ records), where the _id field is populated with a user-defined (unique) value, appears to arbitrarily skip records, without reporting any error.

      I'm not the only one to have experienced this issue:

      http://wi2ki.blogspot.com/2012/11/mongodb-user-mongoimport-did-not-load.html
      http://wi2ki.blogspot.com/2012/11/mongodb-user-re-mongoimport-did-not.html

      Apparently the import works correctly if the _id field is not populated and mongo is left to generate the _id by itself.

      I suspect this is the bug because I have a script now that goes through the original imported json file and checks each _id against the db. If it's missing, it dumps the record into a new file, then imports that file. That works fine, presumably due to the smaller number of records in the new file. It appears to be skipping on the order of 1% of the records in the original file, so a few thousand out of a few hundred thousand.

      I'm marking this as Major. Though there is a workaround, this issue caused us no end of headaches around data integrity, which is pretty important to most projects.

      Attachments

        Activity

          People

            ramon.fernandez@mongodb.com Ramon Fernandez Marina
            clarkfreifeld Clark Freifeld
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: