[SERVER-63853] $merge breaks fields order. It is critical for bioinformatics Created: 18/Feb/22  Updated: 04/Mar/22  Resolved: 04/Mar/22

Status: Closed
Project: Core Server
Component/s: Aggregation Framework
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Platon workaccount Assignee: Eric Sedor
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

MongoDB 5.0.6
PyMongo 4.0.1


Operating System: ALL
Participants:

 Description   

A variety of formats require strict adherence to the sequence of fields, such as bioinformatics

Files of such formats are often very large and contain nested structures, so it is convenient to use them as collections. But to keep the data belonging to the above specs, it is necessary to keep the arrangement of the fields. Unfortunately, aggregations with saving results to another DB lose original arrangement.

Source document example:

{
    "_id": {
        "$oid": "620fe1e87fd143aebe55bad4"
    },
    "#CHROM": 1,
    "POS": 88619,
    "ID": "rs573217706",
    "REF": "G",
    "ALT": ["A", "T"],
    "QUAL": ".",
    "FILTER": ".",
    "INFO": [{
            "RS": 573217706,
            "RSPOS": 88619,
            "dbSNPBuildID": 142,
            "SSR": 0,
            "SAO": 0,
            "VP": "0x050100000005040026000100",
            "WGT": 1,
            "VC": "SNV",
            "CAF": [{
                "$numberDecimal": "0.9988"
            }, ".", {
                "$numberDecimal": "0.001198"
            }],
            "COMMON": 1,
            "TOPMED": [{
                "$numberDecimal": "0.99959384556574923"
            }, {
                "$numberDecimal": "0.00000796381243628"
            }, {
                "$numberDecimal": "0.00039819062181447"
            }]
        },
        ["SLO", "ASP", "VLD", "KGPhase3"]
    ]
}

Part of the aggregation pipeline:

{'$merge': {'into': {'db': 'test_out', 'coll': 'common_all.vcf'}}}

Result:

 

{
    "_id": {
        "$oid": "620fe1e87fd143aebe55bad4"
    },
    "#CHROM": 1,
    "ALT": ["A", "T"],
    "FILTER": ".",
    "ID": "rs573217706",
    "INFO": [{
            "RS": 573217706,
            "RSPOS": 88619,
            "dbSNPBuildID": 142,
            "SSR": 0,
            "SAO": 0,
            "VP": "0x050100000005040026000100",
            "WGT": 1,
            "VC": "SNV",
            "CAF": [{
                "$numberDecimal": "0.9988"
            }, ".", {
                "$numberDecimal": "0.001198"
            }],
            "COMMON": 1,
            "TOPMED": [{
                "$numberDecimal": "0.99959384556574923"
            }, {
                "$numberDecimal": "0.00000796381243628"
            }, {
                "$numberDecimal": "0.00039819062181447"
            }]
        },
        ["SLO", "ASP", "VLD", "KGPhase3"]
    ],
    "POS": 88619,
    "QUAL": ".",
    "REF": "G"
}



 Comments   
Comment by Eric Sedor [ 04/Mar/22 ]

Understood platon.work@gmail.com, sorry; I went ahead and made this submission there on your behalf.

Comment by Platon workaccount [ 24/Feb/22 ]

Can I please ask that you submit this same request at [feedback.mongodb.com|feedback.mongodb.com]?

It will be difficult to do, because the IP-addresses of this portal have been blocked in Russia for more than a year.

Comment by Eric Sedor [ 24/Feb/22 ]

Hi platon.work@gmail.com and thank you for your patience,

Can I please ask that you submit this same request at [feedback.mongodb.com|feedback.mongodb.com]? We're starting to direct new feature requests and improvements to that channel and preferring this JIRA project for bug reports specifically.

You may also want to search and post on the MongoDB Developer Community Forums, as it's possible there are others who have guidance on how to satisfy your use-case.

Thank you,
Eric

Generated at Thu Feb 08 05:58:51 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.