[SERVER-14913] mongoimport imports csv incorrectly when in the presence of even number of escaped quotes Created: 15/Aug/14 Updated: 10/Dec/14 Resolved: 28/Aug/14 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Tools |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Andrew Erlichson | Assignee: | Matt Kangas |
| Resolution: | Won't Fix | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||||
| Steps To Reproduce: | Create a file called bad.csv
Now import it
Now let's look at the collection:
There should be four documents, but there are only two. |
||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||
| Description |
|
When a csv file contains an even number of escaped quotes put in as \", the parser gets confused and reads across line endings, coalescing multiple lines into a single document. Wikipedia says that embedded quotes need to be encoded as "", so arguably, the csv file did not conform to Jimmy Wales's view of CSV, but this particular encoding is the default used by mysql, so we probably need it to work, or at least throw an error. |
| Comments |
| Comment by Matt Kangas [ 28/Aug/14 ] |
|
mongoimport's CSV parser conforms to RFC 4180, which specifies:
Backslash is not a valid escape character per the RFC spec. Of course, the root problem is that CSV was poorly specified for a long time, so considerable differences exist among implementations. If we should add an option to mongoimport to support MySQL's variant of CSV, please let me and mpobrien know. |