mongoimport imports csv incorrectly when in the presence of even number of escaped quotes

XMLWordPrintableJSON

    • Type: Bug
    • Resolution: Won't Fix
    • Priority: Major - P3
    • None
    • Affects Version/s: None
    • Component/s: Tools
    • None
    • ALL
    • Hide

      Create a file called bad.csv

      bad.csv
      "first","last"
      "joe","smith"
      "bad","guy\""
      "evil","monster"
      "sam\"","mill"
      

      Now import it

      desktop:MongoDB aje$ mongoimport --type csv -c bad --drop --headerline < bad.csv
      connected to: 127.0.0.1
      2014-08-15T11:09:02.225-0400 dropping: test.bad
      2014-08-15T11:09:02.254-0400 imported 2 objects

      Now let's look at the collection:

      m101:PRIMARY> db.bad.find().pretty()
      {
      	"_id" : ObjectId("53ee228e35d4ea0c46429cac"),
      	"first" : "joe",
      	"last" : "smith"
      }
      {
      	"_id" : ObjectId("53ee228e35d4ea0c46429cad"),
      	"first" : "bad",
      	"last" : "guy\\\"\n",
      	"field2" : ",",
      	"field3" : "\n",
      	"field4" : "",
      	"field5" : "mill"
      }
      m101:PRIMARY> 
      

      There should be four documents, but there are only two.

      Show
      Create a file called bad.csv bad.csv "first" , "last" "joe" , "smith" "bad" , "guy\" " "evil" , "monster" "sam\" "," mill" Now import it desktop:MongoDB aje$ mongoimport --type csv -c bad --drop --headerline < bad.csv connected to: 127.0.0.1 2014-08-15T11:09:02.225-0400 dropping: test.bad 2014-08-15T11:09:02.254-0400 imported 2 objects Now let's look at the collection: m101:PRIMARY> db.bad.find().pretty() { "_id" : ObjectId( "53ee228e35d4ea0c46429cac" ), "first" : "joe" , "last" : "smith" } { "_id" : ObjectId( "53ee228e35d4ea0c46429cad" ), "first" : "bad" , "last" : "guy\\\" \n", "field2" : "," , "field3" : "\n" , "field4" : "", "field5" : "mill" } m101:PRIMARY> There should be four documents, but there are only two.
    • None
    • 3
    • None
    • None
    • None
    • None
    • None
    • None

      When a csv file contains an even number of escaped quotes put in as \", the parser gets confused and reads across line endings, coalescing multiple lines into a single document.

      Wikipedia says that embedded quotes need to be encoded as "", so arguably, the csv file did not conform to Jimmy Wales's view of CSV, but this particular encoding is the default used by mysql, so we probably need it to work, or at least throw an error.

            Assignee:
            Matt Kangas (Inactive)
            Reporter:
            Andrew Erlichson (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: