[DOCS-10577] Document that mongoimport only supports UTF-8 character encoding Created: 25/Jul/17  Updated: 30/Oct/23  Resolved: 26/Jul/17

Status: Closed
Project: Documentation
Component/s: tools
Affects Version/s: None
Fix Version/s: Server_Docs_20231030

Type: Bug Priority: Major - P3
Reporter: Samaresh Singh Assignee: Allison Reinheimer Moore
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File CARA_daily_unsubscribe_update_20170402_sample.csv    
Participants:
Days since reply: 6 years, 29 weeks ago

 Description   

mongoimport 3.4.2 failed to import documents that were UTF-16 Unicode Text type. The MongoDB version was 3.4.2.

```
$file CARA_daily_unsubscribe_update_20170402_sample.csv
CARA_daily_unsubscribe_update_20170402_sample.csv: Little-endian UTF-16 Unicode text, with very long lines, with CRLF, CR line terminators

$mongoimport -v --host 10.5.16.135 --port 27017 --db test --collection foo4 --type csv --headerline --file ./CARA_daily_unsubscribe_update_20170402_sample.csv
2017-07-24T18:01:46.417-0700 filesize: 1016 bytes
2017-07-24T18:01:46.417-0700 using fields: ��email,BMT_subscribed,BMT_unsubscribe_date,PDM_subscribed,PDM_unsubscribe_date,ESM_subscribed,ESM_unsubscribe_date,PFM_subscribed,PFM_unsubscribe_date,HAR_subscribed,HAR_unsubscribe_date,KEL_subscribed,KEL_unsubscribe_date,MIL_subscribed,MIL_unsubscribe_date,SWI_subscribed,SWI__unsubscribe_date,MON_subscribed,MON_unsubscribe_date,UDC_subscribed,UDC_unsubscribe_date
2017-07-24T18:01:46.422-0700 connected to: 10.5.16.135:27017
2017-07-24T18:01:46.422-0700 ns: test.foo4
2017-07-24T18:01:46.422-0700 connected to node type: standalone
2017-07-24T18:01:46.423-0700 using write concern: w='1', j=false, fsync=false, wtimeout=0
2017-07-24T18:01:46.423-0700 using write concern: w='1', j=false, fsync=false, wtimeout=0
2017-07-24T18:01:46.423-0700 num failures: 3
2017-07-24T18:01:46.423-0700 Failed: lost connection to server
2017-07-24T18:01:46.423-0700 imported 0 documents
```
I am attaching the file that was used to create the error. Once the file was converted to UTF-8, `mongoimport` was successful in importing it.

Sincerely
Samaresh Singh



 Comments   
Comment by Githook User [ 26/Jul/17 ]

Author:

{'email': 'allison.moore@10gen.com', 'username': 'schmalliso', 'name': 'Allison Moore'}

Message: DOCS-10577: mongoimport requires utf8 encoded files
Branch: v3.2
https://github.com/mongodb/docs/commit/b428c2661cfd6d0dc66ffe67c9758d93c756b4cd

Comment by Githook User [ 26/Jul/17 ]

Author:

{'email': 'allison.moore@10gen.com', 'username': 'schmalliso', 'name': 'Allison Moore'}

Message: DOCS-10577: mongoimport requires utf8 encoded files
Branch: v3.4
https://github.com/mongodb/docs/commit/d55b4542f105cab703f696f4085581f4e69e590b

Comment by Githook User [ 26/Jul/17 ]

Author:

{'email': 'allison.moore@10gen.com', 'username': 'schmalliso', 'name': 'Allison Moore'}

Message: DOCS-10577: mongoimport requires utf8 encoded files
Branch: master
https://github.com/mongodb/docs/commit/dce88ebdca6a013db22e757b94eadfa31a726146

Comment by David Golden [ 25/Jul/17 ]

Thanks for the report! mongoimport only supports UTF-8. I'm going to move this ticket to the Documentation project so this limitation gets documented.

Generated at Thu Feb 08 08:00:54 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.