Uploaded image for project: 'MongoDB ETL Tools'
  1. MongoDB ETL Tools
  2. TOOLS-681

mongoimport integer type handling changed

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Fixed
    • Affects Version/s: 3.0.0, 3.0.1
    • Fix Version/s: 3.0.4, 3.1.4
    • Component/s: mongoimport
    • Labels:
      None
    • Backport Requested:
      v3.0
    • Documentation Changes:
      Needed
    • Sprint:
      Kernel Tools Iteration 4

      Description

      The behavior of mongoimport has changed from pre-3.0 with respect to data type handling. This breaks backwards compatibility with systems that used the previous version of mongoimport and relied on its type handling to correctly determine the type of integers imported.

      In previous versions of monogimport integers were imported based on their actual value - doubles, 32-bit and 64-bit integers were imported with their appropriate bson type.

      With the new version, unless wrapped in a strict json type indicator, all numbers are imported as doubles.

      This is a regression in functionality as the previous behavior was a more accurate representation of the data being imported.

      While there are ways to indicate data types and work around this issue, this creates significant issues to existing systems that might not be able to be altered. In these cases it may require a separate pre-processing step that can add significant additional overhead to the process - specifically, json data must be decoded, altered, and re-emitted before being decoded and re-encoded again by the mongoimport process. For high volume data imports this adds a significant cost in terms of either performance or required hardware.

      For systems that can be altered to output the strict mode json data type indications where needed, the end result is that data output from these systems is more highly coupled to mongodb. It maybe that this isn't seen as an issue, but it absolutely is. The more database agnostic actual data can be, the easier it is to use the right tool for the job in any given situation. Encouraging tight coupling between data consumed and mongodb where it isn't absolutely necessary is damaging.

      In any situation where json or the data itself is able to sufficiently indicate the data type, mongodb should not require additional type indication unless that type needs to be overridden.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                shraya.ramani Shraya Ramani
                Reporter:
                underrun Derek Wilson
              • Votes:
                1 Vote for this issue
                Watchers:
                18 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: