[TOOLS-67] Give mongoimport of CSV/TSVs a way to specify import types Created: 31/Aug/11  Updated: 22/Dec/16  Resolved: 22/Jun/16

Status: Closed
Project: MongoDB Command Line Tools
Component/s: mongoimport
Affects Version/s: None
Fix Version/s: 3.3.11

Type: New Feature Priority: Minor - P4
Reporter: Spencer T Brody Assignee: Zach Snow
Resolution: Fixed Votes: 20
Labels: 2016_intern_idea, csv, import

Issue Links:
Depends
Documented
is documented by DOCS-8094 mongoimport typed fields Closed
Duplicate
is duplicated by SERVER-9015 Mongoimport for TSVs does not underst... Closed
is duplicated by TOOLS-1410 data && datatype have benn changed by... Closed
Related
Documentation Changes: Needed

 Description   

When using mongoimport to import a CSV, the field "0123" will get imported as the number 123 no matter what, there's no way to get mongoimport to treat it as a String.

It could be nice to have an option to specify what type each field should be interpreted as. This could also be useful if we start supporting importing types beyond just strings and numbers (dates, for example).



 Comments   
Comment by Scott Hernandez [ 31/Aug/11 ]

This sounds like a great place to write your own import script (like with python/ruby/perl/etc) instead of the tool doing it.

Comment by Alan Gruskoff [ 21/Oct/11 ]

Same for me.
Picking up the leading zero in Zip Codes is often a problem. In
trying to get mongoimport to ingest a CSV file of Zip Codes, I found
that the -type csv mode does not respect double quotes around a string
like "01001" instead posting 1001 as a number.

CSV Import of: "01001","Agawam","MA"
results in this:

{ "_id" : 1001, "city" : "Agawam", "state" : "MA" }

We need a way to specify an imported field is String or Decimal with number of Decimal places. Perhaps a Field : Type map the user could specify.

Comment by Graham Hargreaves [ 09/Dec/11 ]

same here, we have seen an issue where if the field in the csv is null the type gets set to 6 which is not even defined in the docs e.g.

> db.messagingProfile.findOne({inDate: {$type: 18},inDate: null},

{inDate:1}

)

{ "_id" : ObjectId("4edc1d3ac3e40219507a5ec2"), "inDate" : null }

> db.messagingProfile.findOne({inDate: {$type: 6},inDate: null},

{inDate:1}

)

{ "_id" : ObjectId("4edc1d3ac3e40219507a5ec2"), "inDate" : null }

As the type 6 isn't defined from what I can see I am going to raise a separate case

Comment by Pawel Krupinski [ 25/Apr/12 ]

Same here.
I needed to update some rows with a new property, but since my ids were strings Mongo created new records instead of updating them.
CSV is a format common enough so that mongo should allow easy import from it.

Comment by Deyna Cegielski [ 02/May/12 ]

Is there any work around for this?

I've tried wrapping numerical string values in quotes and escaping them but they just end up as a string containing the quotes e.g. ""010"".

Comment by Spencer T Brody [ 02/May/12 ]

If you need different behavior around the handing of types than mongoimport provides, the best workaround is to write a script to do the import yourself. CSV is pretty straightforward to parse and there are many CSV parsing implementations available online. mongoimport is only designed to be used in the very simple, straightforward cases where no special handling of types is required.

Comment by Rob Marscher [ 16/Oct/13 ]

Sorry to comment on an old issue, but I agree that a CSV value of "001", should be imported as "001" and not converted to a integer with a value of 1. Feels like a bug to me and not a case of needing "special handling" of the type. Thanks.

Comment by Mark Clancy [ 27/May/14 ]

+1. Agree that the import doesn't need to be all-singing/dancing; however, the basics should be there — especially this change and proper mapping of date fields on import.

Comment by Joseph E Banks [ 08/Dec/14 ]

+1 Seems like a pretty common use case for folks exporting from relational stores and importing to mongo. Surrounding something in quotes should indicate it's a string in a csv. Don't see that as a special case.

Comment by qginformatique [ 25/Nov/15 ]

+1.
Almost one year later and nothing new for this issue.
I agree too that a CSV value of "001", should be imported as "001" and not converted to a integer with a value of 1.
If there is a text delimiter like ", it should be imported as text.

Comment by Francisco Guimaraes [ 20/Jan/16 ]

+1
Devs, we need it and we know is quite simple to do, right?

Comment by Dmitry Romanenko [ 15/Mar/16 ]

Really need that. One-command load with official mongotools is one of the best ways to deal with ETL from any another database dump.
I can't believe that previous related ticket got instantly closed without any discussion.
https://jira.mongodb.org/browse/TOOLS-745

Comment by Gabriel Russell [ 15/Mar/16 ]

Dmitry,

One-command load with type preservation is already supported with mongoimport if you just use the json format. If the source of your data is mongodb, then you should just use json format with export and import, or better yet, just use dump and restore. If the source of the data is something else, then where will you get the type information from?

Gabriel

Comment by Dmitry Romanenko [ 15/Mar/16 ]

Gabriel, any relational database that cannot give output in json format.
Dumping stuff of table in csv is pretty trivial on any database nowadays. Json is not really same story. In my specific case database just cannot give json output (not MongoDB).

Mongoimport is doing job very well, I'm not saying anything against, but having one-two simple features to pre-define conversion, or at least load longs as strings (like I linked in ticket in previous comment) would make data more consistent or at least flexible. (yeah, I googled and read about csv mongoimport caviats, but as far as I heard its only one single line of code in source code)

Comment by Gabriel Russell [ 15/Mar/16 ]

Dmitry,

All data in CSV or TSV files are text data. The automatic conversion that we do currently is possibly too much. The problem with doing a more full featured data conversion is not that it's hard to implement, but that it's hard to come up with an API for specifying the conversions that serves even most needs. The very moment you've taken a single step down that path, you're started to implement a ETL tool. If one is going to make an ETL tool, then they should set out at the beginning to build one, and not start from some other tool that is almost one.

How would you, or anyone else who sees this thread, like to specify the types that mongoimport should import the data as?

Gabriel

Comment by Githook User [ 07/Jun/16 ]

Author:

{u'username': u'lukedmor', u'name': u'Lucas Morales', u'email': u'lucas@lucasem.com'}

Message: TOOLS-67 1
Branch: TOOLS-67
https://github.com/mongodb/mongo-tools/commit/5a6a20d6328e353a3fe8e95da5badd7bbe78c90b

Comment by Githook User [ 07/Jun/16 ]

Author:

{u'username': u'lukedmor', u'name': u'Lucas Morales', u'email': u'lucas@lucasem.com'}

Message: TOOLS-67 2
Branch: TOOLS-67
https://github.com/mongodb/mongo-tools/commit/6b38fa31b50dd7bd191532052528976ab369dbe5

Comment by Githook User [ 07/Jun/16 ]

Author:

{u'username': u'lukedmor', u'name': u'Lucas Morales', u'email': u'lucas@lucasem.com'}

Message: TOOLS-67 1
Branch: TOOLS-67
https://github.com/mongodb/mongo-tools/commit/a62d07027ffe07e80e790eb70d5e7c1d82b7b855

Comment by Githook User [ 07/Jun/16 ]

Author:

{u'username': u'lukedmor', u'name': u'Lucas Morales', u'email': u'lucas@lucasem.com'}

Message: TOOLS-67 2
Branch: TOOLS-67
https://github.com/mongodb/mongo-tools/commit/2d61d280c9fb1625840cdd5e8671b5e7ad3d3830

Comment by Githook User [ 07/Jun/16 ]

Author:

{u'username': u'zacharyjsnow', u'name': u'Zachary Snow', u'email': u'zach.snow@mongodb.com'}

Message: TOOLS-67 3
Branch: TOOLS-67
https://github.com/mongodb/mongo-tools/commit/4440b15da1bd96ccf7496d8291ccdf0596e72cff

Comment by Githook User [ 07/Jun/16 ]

Author:

{u'username': u'lukedmor', u'name': u'Lucas Morales', u'email': u'lucas@lucasem.com'}

Message: TOOLS-67 4
Branch: TOOLS-67
https://github.com/mongodb/mongo-tools/commit/6b8232e8ec1fcb2001fecc818d29d843de7a0728

Comment by Githook User [ 07/Jun/16 ]

Author:

{u'username': u'zacharyjsnow', u'name': u'Zachary Snow', u'email': u'zach.snow@mongodb.com'}

Message: TOOLS-67 5
Branch: TOOLS-67
https://github.com/mongodb/mongo-tools/commit/ef12d108aef38eb0e4e7eb54fae778d8f40cd039

Comment by Githook User [ 07/Jun/16 ]

Author:

{u'username': u'lukedmor', u'name': u'Lucas Morales', u'email': u'lucas@lucasem.com'}

Message: TOOLS-67 6 (fix tests)
Branch: TOOLS-67
https://github.com/mongodb/mongo-tools/commit/a14189405dfe6ff7f82b6fded5a2771e07b4dee6

Comment by Githook User [ 07/Jun/16 ]

Author:

{u'username': u'zacharyjsnow', u'name': u'Zachary Snow', u'email': u'zach.snow@mongodb.com'}

Message: TOOLS-67 7
Branch: TOOLS-67
https://github.com/mongodb/mongo-tools/commit/263d45bf24ff488710456582b6e2c380cc516179

Comment by Githook User [ 07/Jun/16 ]

Author:

{u'username': u'lukedmor', u'name': u'Lucas Morales', u'email': u'lucas@lucasem.com'}

Message: TOOLS-67 8
Branch: TOOLS-67
https://github.com/mongodb/mongo-tools/commit/d7b12b7a4415339f10fa95d3d73af1e75dc689b4

Comment by Githook User [ 07/Jun/16 ]

Author:

{u'username': u'lukedmor', u'name': u'Lucas Morales', u'email': u'lucas@lucasem.com'}

Message: TOOLS-67
Branch: TOOLS-67
https://github.com/mongodb/mongo-tools/commit/9905988ad1577e28704c92decf552839823e677a

Comment by Zach Snow [ 07/Jun/16 ]

We are currently working on implementing this feature. For those watching this issue, how would you like to specify the format of the dates in a CSV file? Is there some particular standard that would be most useful? We are currently considering using the format for layout used in Golang (https://golang.org/pkg/time/#Parse).

To be clear, we are not trying to enforce one particular date format, but rather decide on a type of layout which users can use to specify the format of their dates.

Comment by Githook User [ 07/Jun/16 ]

Author:

{u'username': u'lukedmor', u'name': u'Lucas Morales', u'email': u'lucas@lucasem.com'}

Message: TOOLS-67 added binary encodings
Branch: TOOLS-67
https://github.com/mongodb/mongo-tools/commit/1aa7c8c66cf19850430b6090293beffa7c959ccc

Comment by Githook User [ 08/Jun/16 ]

Author:

{u'username': u'lukedmor', u'name': u'Lucas Morales', u'email': u'lucas@lucasem.com'}

Message: TOOLS-67 added date formats
Branch: TOOLS-67
https://github.com/mongodb/mongo-tools/commit/f150190c5c7240f23d901d89bf9df934cbc001c8

Comment by Githook User [ 08/Jun/16 ]

Author:

{u'username': u'lukedmor', u'name': u'Lucas Morales', u'email': u'lucas@lucasem.com'}

Message: TOOLS-67 added date formats
Branch: TOOLS-67
https://github.com/mongodb/mongo-tools/commit/cffb716a093a999963229208a8c470a5b3c894d1

Comment by Githook User [ 08/Jun/16 ]

Author:

{u'username': u'lukedmor', u'name': u'Lucas Morales', u'email': u'lucas@lucasem.com'}

Message: TOOLS-67 oracle datetime format is case-insensitive
Branch: TOOLS-67
https://github.com/mongodb/mongo-tools/commit/40e46656a7872b4fa40d3c1bc76aea2a0812895a

Comment by Githook User [ 09/Jun/16 ]

Author:

{u'username': u'lucasem', u'name': u'Lucas Morales', u'email': u'lucas@lucasem.com'}

Message: TOOLS-67 changed parseGrace fail to stop
Branch: TOOLS-67
https://github.com/mongodb/mongo-tools/commit/bd8d6744508a66af8daaa4c098ca2097cab1b2bc

Comment by Githook User [ 09/Jun/16 ]

Author:

{u'username': u'lucasem', u'name': u'Lucas Morales', u'email': u'lucas@lucasem.com'}

Message: Merge branch 'TOOLS-67' of ssh://github.com/mongodb/mongo-tools into TOOLS-67
Branch: TOOLS-67
https://github.com/mongodb/mongo-tools/commit/f3197070e36acf6fb49cedf11c4c2faf155c0522

Comment by Githook User [ 09/Jun/16 ]

Author:

{u'username': u'lucasem', u'name': u'Lucas Morales', u'email': u'lucas@lucasem.com'}

Message: Merge branch 'TOOLS-67' of ssh://github.com/mongodb/mongo-tools into TOOLS-67
Branch: TOOLS-67
https://github.com/mongodb/mongo-tools/commit/f3197070e36acf6fb49cedf11c4c2faf155c0522

Comment by Githook User [ 09/Jun/16 ]

Author:

{u'username': u'lucasem', u'name': u'Lucas Morales', u'email': u'lucas@lucasem.com'}

Message: TOOLS-67 parseGrace stop fix
Branch: TOOLS-67
https://github.com/mongodb/mongo-tools/commit/6d73ae52bf9fc923ac85b90f2d14b69bbdd96f82

Comment by Githook User [ 09/Jun/16 ]

Author:

{u'username': u'lucasem', u'name': u'Lucas Morales', u'email': u'lucas@lucasem.com'}

Message: TOOLS-67 fixed header type parser
Branch: TOOLS-67
https://github.com/mongodb/mongo-tools/commit/0d36031324fd933f48f2554bba2c0e57887df062

Comment by Githook User [ 09/Jun/16 ]

Author:

{u'username': u'lucasem', u'name': u'Lucas Morales', u'email': u'lucas@lucasem.com'}

Message: TOOLS-67 fix header split
Branch: TOOLS-67
https://github.com/mongodb/mongo-tools/commit/8559f022dd698477815889c8283fff99af34977a

Comment by Githook User [ 09/Jun/16 ]

Author:

{u'username': u'lucasem', u'name': u'Lucas Morales', u'email': u'lucas@lucasem.com'}

Message: TOOLS-67 added jstests
Branch: TOOLS-67
https://github.com/mongodb/mongo-tools/commit/e6c279cb0cc1e8ad3fb2b21a063217378099262a

Comment by Githook User [ 09/Jun/16 ]

Author:

{u'username': u'lucasem', u'name': u'Lucas Morales', u'email': u'lucas@lucasem.com'}

Message: TOOLS-67 added more documentation
Branch: TOOLS-67
https://github.com/mongodb/mongo-tools/commit/6864495eaaca3f03abfe1e55b5e63351193d4c2e

Comment by Githook User [ 10/Jun/16 ]

Author:

{u'username': u'lucasem', u'name': u'Lucas Morales', u'email': u'lucas@lucasem.com'}

Message: TOOLS-67 lint
Branch: TOOLS-67
https://github.com/mongodb/mongo-tools/commit/1de8ce6cea02bcbde57c55f10c6a3b6bc0d96645

Comment by Githook User [ 10/Jun/16 ]

Author:

{u'username': u'lucasem', u'name': u'Lucas Morales', u'email': u'lucas@lucasem.com'}

Message: TOOLS-67
Branch: TOOLS-67
https://github.com/mongodb/mongo-tools/commit/94c58fa4d7f2e9ab090820738ec8392058aaa651

Comment by Githook User [ 10/Jun/16 ]

Author:

{u'username': u'lucasem', u'name': u'Lucas Morales', u'email': u'lucas@lucasem.com'}

Message: TOOLS-67
Branch: TOOLS-67
https://github.com/mongodb/mongo-tools/commit/12c5b1a7015e41723606bf69da58e72a59427216

Comment by Githook User [ 13/Jun/16 ]

Author:

{u'username': u'lucasem', u'name': u'Lucas Morales', u'email': u'lucas@lucasem.com'}

Message: TOOLS-67
Branch: TOOLS-67
https://github.com/mongodb/mongo-tools/commit/106f0e6c6c7a8d400f40d46481f7e59fba09264c

Comment by Githook User [ 15/Jun/16 ]

Author:

{u'username': u'zachjs', u'name': u'Zachary Snow', u'email': u'zach.snow@mongodb.com'}

Message: TOOLS-67 Allow escaping of characters that don't need to be in date formats
Branch: TOOLS-67
https://github.com/mongodb/mongo-tools/commit/fbdc09af24bdb130058cdd7e07ee3de47d254f12

Comment by Githook User [ 15/Jun/16 ]

Author:

{u'username': u'zachjs', u'name': u'Zachary Snow', u'email': u'zach.snow@mongodb.com'}

Message: TOOLS-67 tests for splitInlineHeader
Branch: TOOLS-67
https://github.com/mongodb/mongo-tools/commit/30b034656cc41672317dcf95f1664f480c590f33

Comment by Githook User [ 15/Jun/16 ]

Author:

{u'username': u'lucasem', u'name': u'Lucas Morales', u'email': u'lucas@lucasem.com'}

Message: TOOLS-67
Branch: TOOLS-67
https://github.com/mongodb/mongo-tools/commit/976a0e3eef2f799df0369d401037e29e7b1673cf

Comment by Githook User [ 16/Jun/16 ]

Author:

{u'username': u'lucasem', u'name': u'Lucas Morales', u'email': u'lucas@lucasem.com'}

Message: TOOLS-67
Branch: TOOLS-67
https://github.com/mongodb/mongo-tools/commit/7e8ffc0964e1c612e72103c722e0ddf4cc04ce15

Comment by Githook User [ 22/Jun/16 ]

Author:

{u'username': u'lucasem', u'name': u'Lucas Morales', u'email': u'lucas@lucasem.com'}

Message: TOOLS-67 mongoimport typed fields

Signed-off-by: Zachary Snow <zach.snow@mongodb.com>
Branch: master
https://github.com/mongodb/mongo-tools/commit/22346e99032c23c04b6c7ce4e0ff7206833d8e26

Comment by Zach Snow [ 22/Jun/16 ]

Documentation ticket is DOCS-8094

Comment by Githook User [ 04/Aug/16 ]

Author:

{u'username': u'lucasem', u'name': u'Lucas Morales', u'email': u'lucas@lucasem.com'}

Message: TOOLS-67 mongoimport typed fields

Signed-off-by: Zachary Snow <zach.snow@mongodb.com>
Branch: v3.3
https://github.com/mongodb/mongo-tools/commit/22346e99032c23c04b6c7ce4e0ff7206833d8e26

Generated at Fri Oct 19 15:40:13 UTC 2018 using Jira 7.12.1#712002-sha1:609a50578ba6bc73dbf8b05dddd7c04a04b6807c.