Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-88340

Why is sqlite3 ~6x faster than mongod on summing 32 million `2` integers?

    • Type: Icon: Task Task
    • Resolution: Unresolved
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • Labels:
      None
    • Query Execution

      Mongo takes >10 seconds reliably. Sqlite takes about 2. Seems a lot of the mongo execution time is in __curfile_next.

      Notably, adding a `group by null` in sqlite3 makes the query much slower, so perhaps this simply due to mongo not having a fast path for aggregates over the entire table.

      Sqlite stores data only in the b tree leaves. Are the leaves linked? This could explain some of the gap.

      Is sqlite using zone maps? How substantial is this performance gap when the values are random integers?

      Consider writing a small script to insert the same exact set of random ints to both systems. Might need to use C++ so that populating the tables doesn't take ages. May also need to build sqlite from scratch with -g to collect perf samples.

      create table t(a int);
      insert into t values (2);
      insert into t select * from t;
      insert into t select * from t;
      insert into t select * from t;
      insert into t select * from t;
      insert into t select * from t;
      insert into t select * from t;
      insert into t select * from t;
      insert into t select * from t;
      insert into t select * from t;
      insert into t select * from t;
      insert into t select * from t;
      insert into t select * from t;
      insert into t select * from t;
      insert into t select * from t;
      insert into t select * from t;
      insert into t select * from t;
      insert into t select * from t;
      insert into t select * from t;
      insert into t select * from t;
      insert into t select * from t;
      insert into t select * from t;
      insert into t select * from t;
      insert into t select * from t;
      insert into t select * from t;
      insert into t select * from t;
      select count(*) from t;
      select sum(a) from t; -- adding `group by null` makes this much slower

      And in mongo

      db.c.drop()
      db.createCollection("c")
      const num_millions = 32;
      for (let i = 0; i < 1000*4*num_millions; i++) { db.c.insertMany([{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2},{a:2}]);}
      db.c.count();
      db.c.aggregate([{$group: {_id: null, total: {$sum: "$a"}}}]); 

      Notably, `select sum(a) from t group by null;` is a lot slower than `select sum(a) from t` in sqlite3.

            Assignee:
            evan.bergeron@mongodb.com Evan Bergeron
            Reporter:
            evan.bergeron@mongodb.com Evan Bergeron
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated: