Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-20336

O(N^2) perf regression in listCollections and similar code paths [BLOCKING Mongo 3.0 Adoption]

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Blocker - P1
    • Resolution: Duplicate
    • Affects Version/s: 3.0.5
    • Fix Version/s: None
    • Component/s: Storage
    • Labels:
      None
    • Operating System:
      ALL
    • Steps To Reproduce:
      Hide

      Run the following script using a mongo3 shell against a mongo3 mongod. I've reproduced against 3.0.4, 3.0.5, and 3.0.6.

      conn = new Mongo();
      db = conn.getDB("dummy");
      var N = 16000;
       
      print('Creating collections');
      for(var i = 0; i < N; i++) {
        db.createCollection('collection' + i);
      }
       
      print('Listing collections.');
      var start = Date.now();
      db.getCollectionNames();
      print('Time to list collections: ', Date.now() - start);
      

      You should see a time of ~15-30 seconds for the getCollectionNames() call, and it gets much worse as N increases, since it's O(N^2). If you run the same script on mongo 2.6, it will complete in < a second, even for large values of N.

      You'll also see a hang if you restart your mongod server, or do a mongodump, and probably many other operations.

      While db.getCollectionNames() is in progress, any writes will be blocked.

      Show
      Run the following script using a mongo3 shell against a mongo3 mongod. I've reproduced against 3.0.4, 3.0.5, and 3.0.6. conn = new Mongo(); db = conn.getDB("dummy"); var N = 16000;   print('Creating collections'); for(var i = 0; i < N; i++) { db.createCollection('collection' + i); }   print('Listing collections.'); var start = Date.now(); db.getCollectionNames(); print('Time to list collections: ', Date.now() - start); You should see a time of ~15-30 seconds for the getCollectionNames() call, and it gets much worse as N increases, since it's O(N^2). If you run the same script on mongo 2.6, it will complete in < a second, even for large values of N. You'll also see a hang if you restart your mongod server, or do a mongodump, and probably many other operations. While db.getCollectionNames() is in progress, any writes will be blocked.

      Description

      Mongo 3 has an O(N^2) perf issue where N is the number of collections in a database. This is a regression from 2.x. For our dataset this causes a ~15 minutes hang, making mongo 3 completely unusable.

      The hang can be hit in many ways, including:
      1. When calling db.getCollectionNames().
      2. When starting mongod.
      3. When doing a mongodump.
      4. When a secondary mongod server in a replica set transitions to be primary.

      The O(N^2) nature can be clearly seen by this chart showing measured time to perform a db.getCollectionNames() for a given number of collections. There's also an attached graph showing a quadratic best fit.

      Number Collections (in 1000s) list collections time (seconds)
      1 0.164
      2 0.464
      4 1.6
      8 6.1
      16 24
      32 95
      64 439

      Context:

      • We have a multi-tenant system, where each tenant is served by a new collection.
      • As a result, we have on the order of 100,000 collections in a database.
      • We started upgrading to mongo 3 in our production environment, but ran into this issue (fortunately before we upgraded the primary) and had to do an emergency rollback to 2.6.
      • We're now stuck on 2.6 for the moment, but extremely eager to get the benefits of mongo 3 to address pressing issues in production.

      Can you please acknowledge this bug and provide an estimate for when it can be fixed and released?

        Attachments

          Issue Links

            Activity

              People

              • Votes:
                0 Vote for this issue
                Watchers:
                10 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: