[SERVER-2379] mongoimport + csv with commas in values: extraneous data Created: 19/Jan/11  Updated: 12/Jul/16  Resolved: 16/Aug/11

Status: Closed
Project: Core Server
Component/s: Tools
Affects Version/s: 1.6.5
Fix Version/s: 1.9.2

Type: Bug Priority: Major - P3
Reporter: Chris Gill Assignee: Spencer Brody (Inactive)
Resolution: Done Votes: 1
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

10.6.0 Darwin Kernel Version 10.6.0: Wed Nov 10 18:13:17 PST 2010; root:xnu-1504.9.26~3/RELEASE_I386 i386


Issue Links:
Related
is related to SERVER-1097 mongoimport / export should adhere to... Closed
Operating System: ALL
Participants:

 Description   

$ cat blah.csv
name,location
"Bob","Somewhere, Out There"
"Chris","Right here"
$ mongoimport -h localhost -d testing -c testing --headerline -type csv --file blah.csv
connected to: localhost
imported 3 objects
$ mongo testing
MongoDB shell version: 1.6.5
connecting to: testing
> db.testing.find()

{ "_id" : ObjectId("4d372b87a0a15c6c1ae2f8ea"), "name" : "Bob", "location" : "Somewhere, Out There" } { "_id" : ObjectId("4d372b87a0a15c6c1ae2f8eb"), "name" : "Chris", "location" : "Right here", "field2" : " There\"" }

note garbage "field2" field/value in second record, coming somehow from value with a comma in first record



 Comments   
Comment by Spencer Brody (Inactive) [ 16/Aug/11 ]

This code was majorly reworked in fixing SERVER-1097, and I suspect that this bug was fixed in the process.
Can you test this again with the 1.9.2 release that just went out and see if the problem persists?

Comment by Sidney San Martín [ 05/Jul/11 ]

Pull request: https://github.com/mongodb/mongo/pull/30

Comment by Lance Lovette [ 21/Feb/11 ]

I just tested the nightly build (mongodb-linux-i686-2011-02-21) and the issue persists.

Comment by Lance Lovette [ 21/Feb/11 ]

I'm on CentOS 5.5 with mongod 1.6.5 (from RPM) and have this same issue. Text from longer records is carried over into a bogus field in future records.

$ cat blah.csv
1,"This is a long sentence"
2,"My name is Joe"
3,"I like chicken"
4,"It is hot outside"

$ mongoimport --db blah -c blah --file blah.csv --fields a,b --type csv --drop -vvv
Mon Feb 21 21:22:46 creating new connection to:127.0.0.1
connected to: 127.0.0.1
Mon Feb 21 21:22:46 ns: blah.blah
dropping: blah.blah
Mon Feb 21 21:22:46 filesize: 95
Mon Feb 21 21:22:46 got line:1,"This is a really long sentence"
Mon Feb 21 21:22:46 got line:2,"My name is Joe"
Mon Feb 21 21:22:46 got line:3,"I like chicken"
Mon Feb 21 21:22:46 got line:4,"It is hot outside"
Mon Feb 21 21:22:46 got line:
imported 4 objects

$ mongo blah
MongoDB shell version: 1.6.5
connecting to: blah
> db.blah.find()

{ "_id" : ObjectId("4d62d7a676a8a7bc0085ba5a"), "a" : 1, "b" : "This is a really long sentence" } { "_id" : ObjectId("4d62d7a676a8a7bc0085ba5b"), "a" : 2, "b" : "My name is Joe", "field2" : " long sentence\"" } { "_id" : ObjectId("4d62d7a676a8a7bc0085ba5c"), "a" : 3, "b" : "I like chicken", "field2" : " long sentence\"" } { "_id" : ObjectId("4d62d7a676a8a7bc0085ba5d"), "a" : 4, "b" : "It is hot outside", "field2" : "ng sentence\"" }

The import is correct if I remove the quotes altogether, or replace the comma with a TAB (although that includes the quotes in the field value).

Comment by Guanqun Lu [ 27/Jan/11 ]

I tried on my mac osx with latest source, no such problem found...

Comment by Chris Gill [ 19/Jan/11 ]

sorry that name is not very specific, i can't seem to change it. should be more like "mongoimport for csv with commas in values results in extraneous data during import"

Generated at Thu Feb 08 02:59:47 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.