Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-70106

Buildfest feedback: $merge is slow vs insert

    XMLWordPrintableJSON

Details

    • Icon: Bug Bug
    • Resolution: Works as Designed
    • Icon: Major - P3 Major - P3
    • None
    • 6.0.1
    • None
    • ALL
    • Hide

      let mut vec = Vec::new();
      for n in 0..10000 {
          let new_doc = doc! {
              "title": "T", "year": 2020, "plot": "plot description",
          };
          vec.push(new_doc);
      }
      all.insert_many(vec, None).unwrap();
      // drop collection
      let pipeline = vec![
          doc! { "$documents": [ { "dens": 0 } ] },
          doc! { "$densify": {
                  "field": "dens",
                  "range": { "step": 1, "bounds": [0, 10000]}
              }},
          doc! { "$addFields":  doc! {
                  "title": "T",
                  "year": 2020,
                  //"rand": {"$rand": {} },
                  "_id": {"$rand": {} },
              }},
          // doc! { "$out": "all" },
          doc! { "$merge": "all" },
      ];
      db.aggregate(pipeline, None).unwrap(); 

      Show
      let mut vec = Vec::new(); for n in 0..10000 { let new_doc = doc! { "title": "T", "year": 2020, "plot": "plot description", }; vec.push(new_doc); } all.insert_many(vec, None).unwrap(); // drop collection let pipeline = vec![ doc! { "$documents": [ { "dens": 0 } ] }, doc! { "$densify": { "field": "dens", "range": { "step": 1, "bounds": [0, 10000]} }}, doc! { "$addFields": doc! { "title": "T", "year": 2020, //"rand": {"$rand": {} }, "_id": {"$rand": {} }, }}, // doc! { "$out": "all" }, doc! { "$merge": "all" }, ]; db.aggregate(pipeline, None).unwrap();
    • QO 2022-10-17

    Description

      I was populating an empty (dropped) collection with semi-random data. Creating 10k records in Rust and using insert_many took 748ms. Using $densify and $out to create similar records took 565ms. Using $merge instead of $out took 91,983ms. It seems that $merge should at least be faster than insert_many.

      I recall that setting the _id field in the pipeline prior to $merge improved performance, but I can no longer reproduce this.

      Attachments

        Activity

          People

            alya.berciu@mongodb.com Alya Berciu
            maxim.katcharov@mongodb.com Maxim Katcharov
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: