mongorestore creates indexes in a non-deterministic order

    • Type: Bug
    • Resolution: Unresolved
    • Priority: Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • None
    • Tools and Replicator
    • 2

      Problem Statement/Rationale

      Running mongorestore will create indexes over a collection in a different order each time, so coll.getIndexes() will return the list of indexes in different order each time.

      While I understand that the output of getIndexes() and similar internal functions does not seem to be guaranteed, in practice this is important in order to make some recent tests that use mongorestore deterministic. So I think, apart from any fixes on the product side, and in the tests, mongorestore should also be made deterministic, just to avoid any potential customer grief going forward.

      Steps to Reproduce

      1. Run:

      time buildscripts/resmoke.py run --suites=query_golden_join_optimization_plan_stability jstests/query_golden/join_opt/plan_stability_tpch_fuzzed.js  --pauseAfterPopulate --runAllFeatureFlagTests --storageEngineCacheSizeGB=1
      

      until it stops at : "pausing indefinitely". This test calls mongorestore internally. Then, examine the indexes with the client:

      use plan_stability_tpch_fuzzed;
      db.lineitem.getIndexes();
      [
        { v: 2, key: { _id: 1 }, name: '_id_' },
        {
          v: 2,
          key: { l_partkey: 1, l_suppkey: 1 },
          name: 'l_partkey_1_l_suppkey_1'
        },
        { v: 2, key: { l_shipdate: 1 }, name: 'l_shipdate_1' },
        { v: 2, key: { l_commitdate: 1 }, name: 'l_commitdate_1' },
        { v: 2, key: { l_receiptdate: 1 }, name: 'l_receiptdate_1' },
        { v: 2, key: { l_orderkey: 1 }, name: 'l_orderkey_1' },
        { v: 2, key: { l_partkey: 1 }, name: 'l_partkey_1' },
        { v: 2, key: { l_suppkey: 1 }, name: 'l_suppkey_1' },
        {
          v: 2,
          key: { l_orderkey: 1, l_linenumber: 1 },
          name: 'l_orderkey_1_l_linenumber_1',
          unique: true
        }
      ]
      

      The order of indexes will be different each time, even with maxNumActiveUserIndexBuilds: 1

      Expected Results

      The order of indexes should be identical each time mongorestore is run.

      Actual Results

      The order of indexes is non-deterministic.

      Additional Notes

      The list of indexes is one of the inputs that the new Join Optimizer uses to determine the optimal join order. To avoid customer grief, this join order should be deterministic, so that the same plan is chosen every time. With non-determinsitic inputs, the join order is also non-deterministic, which opens up potential for unexpected performance regressions.

      Regardless of any additional fixes to make the product more robust (SERVER-121388) and not rely on any explicit order of indexes, I think it would be a quick fix to also fix mongorestore so that it also plays its part. This is especially important since we now have tests that use mongorestore and those tests should also be deterministic in order to avoid BFs.

      An LLM suggested pre-sorting the index list by name, which seems to be a reasonable solution – I have not tried it though.

      An alternative solution would be to never store the index list in a map and then iterate over the map.

      diff --git a/common/idx/index_catalog.go b/common/idx/index_catalog.go
      index 47fa7755..8d72806d 100644
      --- a/common/idx/index_catalog.go
      +++ b/common/idx/index_catalog.go
      @@ -2,6 +2,7 @@ package idx
       
       import (
              "fmt"
      +       "slices"
              "strings"
              "sync"
       
      @@ -170,8 +171,14 @@ func (i *IndexCatalog) GetIndexes(database, collection string) []*IndexDocument
              if !found {
                      return nil
              }
      +       indexNames := make([]string, 0, len(collIndexCatalog.indexes))
      +       for name := range collIndexCatalog.indexes {
      +               indexNames = append(indexNames, name)
      +       }
      +       slices.Sort(indexNames)
              var syncedIndexes []*IndexDocument
      -       for _, index := range collIndexCatalog.indexes {
      +       for _, name := range indexNames {
      +               index := collIndexCatalog.indexes[name]
                      if !collIndexCatalog.simpleCollation && !hasCollationOnIndex(index) {
                              index.Options["collation"] = bson.D{{"locale", "simple"}}
                      }
      

            Assignee:
            Unassigned
            Reporter:
            Philip Stoev
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated: