Uploaded image for project: 'MongoDB Database Tools'
  1. MongoDB Database Tools
  2. TOOLS-3411

Mongodump does not maintain field-order for collection option keys

    • Type: Icon: Bug Bug
    • Resolution: Fixed
    • Priority: Icon: Critical - P2 Critical - P2
    • 100.9.1
    • Affects Version/s: 4.2.0, 100.0.0, 100.1.0, 100.2.0, 100.3.0, 100.4.0, 100.5.0, 100.6.0, 100.7.0, 100.8.0, 100.9.0
    • Component/s: None
    • TAR 2023-10-30
    • 2
    • Tools and Replicator
    • 0.75

      Issue summary

      It is possible for affected versions of mongodump to randomly change the order of collection option keys. View pipelines and schema validation are the only collection options sensitive to field ordering. If your cluster does not use views or schema validation, it is not impacted by this issue. Depending on the specific contents of the view or validator options, these alterations could change the result set returned by a view or change which documents are accepted by a validator. Not all view pipelines or schema validations are affected by key reordering.

      As an example of key reordering, imagine this view pipeline:

      [ { $sort: { price: 1, weight: 1 } }, { $limit: 100 } ]

      When dumped with an affected version of mongodump, documents in this view pipeline could have their keys reordered. For example, it could turn into this:

       [ { $sort: { weight: 1, price: 1 } }, { $limit: 100 } ]

      Here, the price and weight keys have been reordered. This could change the output of the view.

      Mongodump can only change the order of fields in collection options and their subdocuments. Only the pipeline and validator options are sensitive to field ordering. The order of items in arrays cannot be changed. The order of aggregation stages within a view pipeline cannot be changed. Indexes are not impacted. Data dumped by mongodump is not impacted. 

      Mongorestore was not impacted by this issue. So if a dump was unaffected by this issue, mongorestore would correctly restore the dump.

      This issue was introduced in the MongoDB Database Tools version 4.2.0 (released August 9, 2019) and was fixed in version 100.9.1 (released November 8, 2023). We recommend upgrading to version 100.9.1 or later.

      The fix for this issue was deployed to MongoDB Atlas on November 28, 2023.

      Impact

      If you took a backup using an affected version of mongodump, that backup may have a view pipeline or validator which could have incorrect field ordering.

      If you restored an affected backup using mongorestore, then the cluster you restored it to would have affected view pipeline or validator options. This issue could also occur in the Atlas Shared Tier even if you did not use mongodump and mongorestore yourself, as some features use mongodump and mongorestore internally. So your cluster could be affected if you have a susceptible view or validator and one of these is true:

      • You used an affected version of mongodump to take a backup then restored it using mongorestore.
      • Your cluster was a M0 cluster between 2020-01-13 and 2023-11-28
      • Your cluster was upgraded from the Shared Tier to Dedicated Tier (M10+) or Serverless between 2020-01-13 and 2023-11-28
      • Your cluster was restored from a Shared Tier backup taken between 2020-01-13 and 2023-11-28

      Detecting views or validators that could be affected

      We cannot detect if a reordering occurred in a backup taken with mongodump, we can only detect if a view pipeline or schema validator could be susceptible to reordering.

      You can use the script linked below to detect if your cluster has view pipelines or schema validators that could be susceptible to this issue. You can also use the script to detect if a backup generated by mongodump has view pipelines or schema validators that could be susceptible.

      https://github.com/mongodb-labs/tools-3411-script 

      If you find that a view pipeline or schema validator could be susceptible to the issue, you will need to audit it to check whether it is correct.

      Auditing views or validators that could be affected

      If the detection script has flagged collections that need to be audited, you should check if the view pipeline or schema validator is still correct. You should reason through the pipeline or validator to ensure it is doing what your applications expect. Pay particular attention to multi-field subdocuments since that is where the field reordering could occur. 

      When auditing your views or validators, there are a few things you should bear in mind: 

      1. Matching against entire embedded/nested documents requires an exact match, including field order. See the documentation on matching against embedded documents for more information.
      2. Adding multi-field subdocuments in a pipeline (e.g. with $addFields) might cause subsequent matches in the pipeline or in your application to be incorrect for the reason listed above.
      3. Sorting by multiple fields depends on the field order of the sort specification. See the documentation on ascending/descending sort for more information.
      4. Many different stages or operators can match, sort, or add fields. Make sure to check all parts of the pipeline or validator for potential issues.

      This is not an exhaustive list of things you should check. The only foolproof way to know if a view pipeline or schema validator is correct is to fully reason through the pipeline or validator to make sure it is working the way your application expects.

      Fixing affected views or validators

      No matter what, you should upgrade to Database Tools version 100.9.1 or later.

      Depending on how you were affected, there are different tasks you can perform to remediate the issue.

      How to fix an incorrect view pipeline on a cluster

      If you have determined that a view has an incorrect definition, use collMod to update the view definition to the correct pipeline.

      How to fix an incorrect validator on a cluster

      If you have determined that a collection has incorrect schema validation, use collMod to update the validator to the correct query.

      After you run collMod to use the correct validator, if the validationLevel is "strict" and the validationAction is "error", updates to now invalid documents will be rejected. Because of this, you may also want to run a query to check for documents in the collection which do not match the correct validation query. If there are invalid documents, you may want to update them so they are valid. See the documentation on validationLevel for more information.

      How to fix an incorrect view pipeline or validator in a dump

      If you have determined that your dump contains an incorrect view pipeline or validator, there are several options for how to fix it:

      Option 1: New dump

      Create a new dump with the latest version of mongodump.

      Option 2: Replace metadata files

      Mongodump can create backups in two different formats: directory or archive.

      For the directory, mongodump will have a root dump directory. By default this directory is called "dump" but the name can be set by --out. Mongodump creates one subdirectory per database. Inside each database directory mongodump will create a .metadata.json file for each collection. If the collection is called foo, the file will be foo.metadata.json. This file contains an Extended JSON (v2) document which describes the options and indexes of the collection. Regular collections will also have a BSON file containing all the documents dumped for the collection. A view will only have a .metadata.json file and will not have a BSON file. This is the format of directory dumps:

      dump
      ├── db1
      │    ├── coll1.bson
      │    ├── coll1.metadata.json
      │    └── view1.metadata.json 
      ├── db2
      │    ├── coll1.bson
      │    ├── coll1.metadata.json
      │    └── view1.metadata.json
      └── oplog.bson

      If the dump was created with the --gzip option, then the bson files and the .metadata.json files will be gzipped. Their extensions will be .bson.gz and .metadata.json.gz.

      To fix an affected view definition in a directory dump, create a new dump with the latest version of mongodump. You can use the --db and --collection options to dump only the affected view. Take the .metadata.json or .metadata.json.gz file from the new dump and replace the corresponding files in the old dump.

      Option 3: Edit metadata files

      You can manually edit .metadata.json files to correct the view pipeline. Say the contents of the .metadata.json file is:

      {"options":{"viewOn":"sales","pipeline":[{"$fill":{"sortBy":{"month":{"$numberInt":"1"},"year":{"$numberInt":"1"}},"output":{"sales":{"method":"locf"}}}}]},"indexes":[],"collectionName":"salesView","type":"view"}

      You can edit this file in any text editor to change the order of the month and year fields inside $fill.sortBy

      In general, the pipeline is defined in options.pipeline and the schema validator is defined in options.validator. You can use a text editor to reorder fields as needed.

      If the .metadata.json files are gzipped, you can unzip, edit, and rezip these files. For example, on unix systems, you can use gunzip foo.metadata.json.gz to get foo.metadata.json, then rezip it using gzip foo.metadata.json.

      If you ever need to skip validation during a restore for some reason, you can use the --bypassDocumentValidation option on mongorestore.

      How to fix an incorrect view pipeline or validator in a dump created with --archive

      Option 1: New dump

      Create a new dump with the latest version of mongodump.

      Option 2: Ask support for help editing the archive

      Archive dumps are binary files in a custom MongoDB format. It is possible to edit archives. However, editing archives is prone to error and you could accidentally corrupt your archive. We recommend reaching out to MongoDB Support if you need to edit your archive file.

            Assignee:
            johnny.dubois@mongodb.com Johnny DuBois
            Reporter:
            anirudh.dutt@gmail.com Anirudh Dutt
            Jian Guan, Tim Fogarty
            Votes:
            0 Vote for this issue
            Watchers:
            10 Start watching this issue

              Created:
              Updated:
              Resolved: