[SERVER-70106] Buildfest feedback: $merge is slow vs insert Created: 29/Sep/22  Updated: 27/Oct/23  Resolved: 07/Oct/22

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 6.0.1
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Maxim Katcharov Assignee: Alya Berciu
Resolution: Works as Designed Votes: 0
Labels: buildfest-2022
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File experiment.js    
Operating System: ALL
Steps To Reproduce:

let mut vec = Vec::new();
for n in 0..10000 {
    let new_doc = doc! {
        "title": "T", "year": 2020, "plot": "plot description",
    };
    vec.push(new_doc);
}
all.insert_many(vec, None).unwrap();
// drop collection
let pipeline = vec![
    doc! { "$documents": [ { "dens": 0 } ] },
    doc! { "$densify": {
            "field": "dens",
            "range": { "step": 1, "bounds": [0, 10000]}
        }},
    doc! { "$addFields":  doc! {
            "title": "T",
            "year": 2020,
            //"rand": {"$rand": {} },
            "_id": {"$rand": {} },
        }},
    // doc! { "$out": "all" },
    doc! { "$merge": "all" },
];
db.aggregate(pipeline, None).unwrap(); 

Sprint: QO 2022-10-17
Participants:

 Description   

I was populating an empty (dropped) collection with semi-random data. Creating 10k records in Rust and using insert_many took 748ms. Using $densify and $out to create similar records took 565ms. Using $merge instead of $out took 91,983ms. It seems that $merge should at least be faster than insert_many.

I recall that setting the _id field in the pipeline prior to $merge improved performance, but I can no longer reproduce this.


Generated at Thu Feb 08 06:15:16 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.