[SERVER-8232] Mongoexport should not interpret '\n' within a field as an actual newline. Created: 18/Jan/13  Updated: 24/Mar/14  Resolved: 23/Mar/14

Status: Closed
Project: Core Server
Component/s: Tools
Affects Version/s: 2.0.4
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Gerric Chaplin Assignee: Stennie Steneker (Inactive)
Resolution: Done Votes: 1
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
Operating System: ALL
Steps To Reproduce:

Create a JSON document that includes '\n' within the middle of a string and watch the string get split over two lines when the document is exported to csv.

Participants:

 Description   

Hi,

Hopefully this is not a duplicate. I did not manage to find any similar tickets.

I know there is no clear guidance on this within rfc4180, but to me it makes sense that a '\n' within a CSV field should not be interpreted as an actual new line when writing out to a csv file. Perhaps converting it to '
n' instead of creating a new line. Have you guys had any discussion about this previously?

JSON:
"app": "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<!DOCTYPE plist PUBLIC \"-//Apple//DTD PLIST 1.0//EN\" \"http://www.apple.com/DTDs/PropertyList-1.0.dtd\">\n<plist version=\"1.0\">\n<string>Sprint Zone</string>\n</plist>\nSprint Zone",
 
CSV:
"<?xml version=""1.0"" encoding=""UTF-8""?>
<!DOCTYPE plist PUBLIC ""-//Apple//DTD PLIST 1.0//EN"" ""http://www.apple.com/DTDs/PropertyList-1.0.dtd"">
<plist v..."

My export command.

mongoexport -h ${hostname} -d ${database} -c ${collection} -u ${mongo_user} -p ${mongo_password} --csv --out ${filename}-${current}.csv --fieldFile fieldFile --query "{REMOVED}"

mongoexport version 2.0.4



 Comments   
Comment by Gerric Chaplin [ 24/Mar/14 ]

Hi Stephen,

Thank again for the update. I think you are correct with regards to my interpretation of the RFC.
After reading the lines you specified it does look like it is compliant.
Sorry about troubling you with this.

Best Regards,
Gerric Chaplin

Comment by Stennie Steneker (Inactive) [ 24/Mar/14 ]

Hi Gerric,

I tried with the RFC sample first, and since it returned results as expected I tried your example:

db.server8232.insert({
    "app": "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<!DOCTYPE plist PUBLIC \"-//Apple//DTD PLIST 1.0//EN\" \"http://www.apple.com/DTDs/PropertyList-1.0.dtd\">\n<plist version=\"1.0\">\n<string>Sprint Zone</string>\n</plist>\nSprint Zone",
})

with: mongoexport -d test -c server8232 --csv -f app

app
"<?xml version=""1.0"" encoding=""UTF-8""?>
<!DOCTYPE plist PUBLIC ""-//Apple//DTD PLIST 1.0//EN"" ""http://www.apple.com/DTDs/PropertyList-1.0.dtd"">
<plist version=""1.0"">
<string>Sprint Zone</string>
</plist>
Sprint Zone"

This is similar to what you posted in the ticket description.

I think your expectations on RFC-4180 might be different as this output appears to be compliant with the specific RFC points of:

  • 6) Fields containing line breaks (CRLF), double quotes, and commas should be enclosed in double-quotes.
  • 7) If double-quotes are used to enclose fields, then a double-quote appearing inside a field must be escaped by preceding it with another double quote.

You should expect to see the entire field enclosed by quotes, not the individual newlines.

Regards,
Stephen

Comment by Gerric Chaplin [ 24/Mar/14 ]

Hi Stephen,

Thanks for having a look at this issue. I just have a question on what exactly you used for your test data? You can clearly see by the output in my example that the \n has not been encapsulated while quotes have been enclosed by double quotes, which is what you would expect according to RFC-4180. I don't see quotes where I expect the \n to be, which would at least show that is was attempting to enclose the \n. I'll give it another test in 2.4.9.

Thanks for your help with this.
Best,
Gerric chaplin

Comment by Stennie Steneker (Inactive) [ 23/Mar/14 ]

Tested with mongoexport 2.0.4 (and 2.4.9); newlines are enclosed in double quotes as expected by RFC-4180.

Comment by Stennie Steneker (Inactive) [ 23/Mar/14 ]

RFC-4180 does have a formatting suggestion:

6. Fields containing line breaks (CRLF), double quotes, and commas
should be enclosed in double-quotes. For example:

"aaa","b CRLF
bb","ccc" CRLF
zzz,yyy,xxx

Generated at Thu Feb 08 03:16:53 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.