Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-63853

$merge breaks fields order. It is critical for bioinformatics

    • Type: Icon: Bug Bug
    • Resolution: Done
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: Aggregation Framework
    • Labels:
      None
    • Environment:
      MongoDB 5.0.6
      PyMongo 4.0.1
    • ALL

      A variety of formats require strict adherence to the sequence of fields, such as bioinformatics

      Files of such formats are often very large and contain nested structures, so it is convenient to use them as collections. But to keep the data belonging to the above specs, it is necessary to keep the arrangement of the fields. Unfortunately, aggregations with saving results to another DB lose original arrangement.

      Source document example:

      {
          "_id": {
              "$oid": "620fe1e87fd143aebe55bad4"
          },
          "#CHROM": 1,
          "POS": 88619,
          "ID": "rs573217706",
          "REF": "G",
          "ALT": ["A", "T"],
          "QUAL": ".",
          "FILTER": ".",
          "INFO": [{
                  "RS": 573217706,
                  "RSPOS": 88619,
                  "dbSNPBuildID": 142,
                  "SSR": 0,
                  "SAO": 0,
                  "VP": "0x050100000005040026000100",
                  "WGT": 1,
                  "VC": "SNV",
                  "CAF": [{
                      "$numberDecimal": "0.9988"
                  }, ".", {
                      "$numberDecimal": "0.001198"
                  }],
                  "COMMON": 1,
                  "TOPMED": [{
                      "$numberDecimal": "0.99959384556574923"
                  }, {
                      "$numberDecimal": "0.00000796381243628"
                  }, {
                      "$numberDecimal": "0.00039819062181447"
                  }]
              },
              ["SLO", "ASP", "VLD", "KGPhase3"]
          ]
      }
      

      Part of the aggregation pipeline:

      {'$merge': {'into': {'db': 'test_out', 'coll': 'common_all.vcf'}}}
      

      Result:

       

      {
          "_id": {
              "$oid": "620fe1e87fd143aebe55bad4"
          },
          "#CHROM": 1,
          "ALT": ["A", "T"],
          "FILTER": ".",
          "ID": "rs573217706",
          "INFO": [{
                  "RS": 573217706,
                  "RSPOS": 88619,
                  "dbSNPBuildID": 142,
                  "SSR": 0,
                  "SAO": 0,
                  "VP": "0x050100000005040026000100",
                  "WGT": 1,
                  "VC": "SNV",
                  "CAF": [{
                      "$numberDecimal": "0.9988"
                  }, ".", {
                      "$numberDecimal": "0.001198"
                  }],
                  "COMMON": 1,
                  "TOPMED": [{
                      "$numberDecimal": "0.99959384556574923"
                  }, {
                      "$numberDecimal": "0.00000796381243628"
                  }, {
                      "$numberDecimal": "0.00039819062181447"
                  }]
              },
              ["SLO", "ASP", "VLD", "KGPhase3"]
          ],
          "POS": 88619,
          "QUAL": ".",
          "REF": "G"
      }

            Assignee:
            eric.sedor@mongodb.com Eric Sedor
            Reporter:
            platon.work@gmail.com Platon workaccount
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: