Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-51687

Suffix generation for idents in the durable catalog can conflict

    • Type: Icon: Bug Bug
    • Resolution: Unresolved
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: Storage
    • Storage Execution
    • ALL
    • Hide

      Additional logging used for the example:

      diff --git a/src/mongo/db/storage/durable_catalog_impl.cpp b/src/mongo/db/storage/durable_catalog_impl.cpp
      index e49dcb106d..edb95cfea6 100644
      --- a/src/mongo/db/storage/durable_catalog_impl.cpp
      +++ b/src/mongo/db/storage/durable_catalog_impl.cpp
      @@ -403,7 +403,9 @@ DurableCatalogImpl::DurableCatalogImpl(RecordStore* rs,
             _directoryPerDb(directoryPerDb),
             _directoryForIndexes(directoryForIndexes),
             _rand(_newRand()),
      -      _engine(engine) {}
      +      _engine(engine) {
      +          logd("+++ DurableCatalogImpl::DurableCatalogImpl _rand is " + _rand);
      +      }
       
       DurableCatalogImpl::~DurableCatalogImpl() {
           _rs = nullptr;
      @@ -415,7 +417,9 @@ std::string DurableCatalogImpl::_newRand() {
       
       bool DurableCatalogImpl::_hasEntryCollidingWithRand() const {
           // Only called from init() so don't need to lock.
      +    logd("+++ DurableCatalogImpl::_hasEntryCollidingWithRand");
           for (auto it = _catalogIdToEntryMap.begin(); it != _catalogIdToEntryMap.end(); ++it) {
      +        logd("+++ Checking if _rand conflicts with " + it->second.ident);
               if (StringData(it->second.ident).endsWith(_rand))
                   return true;
           }
      
      Show
      Additional logging used for the example: diff --git a/src/mongo/db/storage/durable_catalog_impl.cpp b/src/mongo/db/storage/durable_catalog_impl.cpp index e49dcb106d..edb95cfea6 100644 --- a/src/mongo/db/storage/durable_catalog_impl.cpp +++ b/src/mongo/db/storage/durable_catalog_impl.cpp @@ -403,7 +403,9 @@ DurableCatalogImpl::DurableCatalogImpl(RecordStore* rs, _directoryPerDb(directoryPerDb), _directoryForIndexes(directoryForIndexes), _rand(_newRand()), - _engine(engine) {} + _engine(engine) { + logd( "+++ DurableCatalogImpl::DurableCatalogImpl _rand is " + _rand); + } DurableCatalogImpl::~DurableCatalogImpl() { _rs = nullptr; @@ -415,7 +417,9 @@ std::string DurableCatalogImpl::_newRand() { bool DurableCatalogImpl::_hasEntryCollidingWithRand() const { // Only called from init() so don't need to lock. + logd( "+++ DurableCatalogImpl::_hasEntryCollidingWithRand" ); for (auto it = _catalogIdToEntryMap.begin(); it != _catalogIdToEntryMap.end(); ++it) { + logd( "+++ Checking if _rand conflicts with " + it->second.ident); if (StringData(it->second.ident).endsWith(_rand)) return true ; }

      We use DurableCatalogImpl::_rand as the suffix when generating new idents in the server. The '_rand' is generated at startup and remains const throughout the uptime of the server. During the initialization of the durable catalog, we check if there's an entry colliding with the '_rand' we've generated. If this '_rand' is already in use by an existing ident, we'll generate a new one. However, we only check the contents of the '_catalogIdToEntryMap' for existing idents which only contains the idents starting with "collection-" and not the idents starting with "index-". It isn't guaranteed that all the indexes belonging to a collection share the same ident suffix as demonstrated below.
       
      Startup mongod with an empty /data/db directory.

      ...
      2020-10-16T09:45:51.632-04:00 I  STORAGE  [initandlisten] Opening WiredTiger {"config":"create,cache_size=31663M,session_max=33000,eviction=(threads_min=4,threads_max=4),config_base=false,statistics=(fast),log=(enabled=true,archive=true,path=journal,compressor=snappy),file_manager=(close_idle_time=100000,close_scan_interval=10,close_handle_minimum=250),statistics_log=(wait=0),verbose=[recovery_progress,checkpoint_progress,compact_progress],debug_mode=(table_logging=true,),"}
      ...
      2020-10-16T09:45:52.362-04:00 I  -        [initandlisten] +++ DurableCatalogImpl::DurableCatalogImpl _rand is -4816864174458775216
      2020-10-16T09:45:52.363-04:00 I  -        [initandlisten] +++ DurableCatalogImpl::_hasEntryCollidingWithRand
      ...
      2020-10-16T09:45:52.387-04:00 I  STORAGE  [initandlisten] createCollection {"namespace":"admin.system.version","uuidDisposition":"provided","uuid":{"uuid":{"$uuid":"af53ac04-602d-4739-98f2-aebb186f85a4"}},"options":{"uuid":{"$uuid":"af53ac04-602d-4739-98f2-aebb186f85a4"}}}
      2020-10-16T09:45:52.428-04:00 I  INDEX    [initandlisten] Index build: done building {"buildUUID":null,"namespace":"admin.system.version","index":"_id_","commitTimestamp":null}
      2020-10-16T09:45:52.428-04:00 I  REPL     [initandlisten] Setting featureCompatibilityVersion {"newVersion":"4.9"}
      ...
      

      The '_rand' used is -4816864174458775216 and the contents of /data/db are the following:

      -rw------- 1 gregory  20K Oct 16 09:46 collection-0--4816864174458775216.wt
      -rw------- 1 gregory  20K Oct 16 09:46 collection-2--4816864174458775216.wt
      -rw------- 1 gregory 4.0K Oct 16 09:45 collection-4--4816864174458775216.wt
      drwx------ 2 gregory 4.0K Oct 16 09:47 diagnostic.data/
      -rw------- 1 gregory  20K Oct 16 09:46 index-1--4816864174458775216.wt
      -rw------- 1 gregory  20K Oct 16 09:46 index-3--4816864174458775216.wt
      -rw------- 1 gregory 4.0K Oct 16 09:45 index-5--4816864174458775216.wt
      -rw------- 1 gregory 4.0K Oct 16 09:45 index-6--4816864174458775216.wt
      drwx------ 2 gregory 4.0K Oct 16 09:45 journal/
      -rw------- 1 gregory  20K Oct 16 09:46 _mdb_catalog.wt
      -rw------- 1 gregory    6 Oct 16 09:45 mongod.lock
      -rw------- 1 gregory 4.0K Oct 16 09:45 sizeStorer.wt
      -rw------- 1 gregory  114 Oct 16 09:45 storage.bson
      -rw------- 1 gregory   47 Oct 16 09:45 WiredTiger
      -rw------- 1 gregory 4.0K Oct 16 09:45 WiredTigerHS.wt
      -rw------- 1 gregory   21 Oct 16 09:45 WiredTiger.lock
      -rw------- 1 gregory 1.3K Oct 16 09:46 WiredTiger.turtle
      -rw------- 1 gregory  32K Oct 16 09:46 WiredTiger.wt
      

      After creating the collection test.a, the contents of /data/db are the following:

      -rw------- 1 gregory  20K Oct 16 09:46 collection-0--4816864174458775216.wt
      -rw------- 1 gregory  20K Oct 16 09:46 collection-2--4816864174458775216.wt
      -rw------- 1 gregory 4.0K Oct 16 09:45 collection-4--4816864174458775216.wt
      -rw------- 1 gregory 4.0K Oct 16 09:47 collection-7--4816864174458775216.wt
      drwx------ 2 gregory 4.0K Oct 16 09:47 diagnostic.data/
      -rw------- 1 gregory  20K Oct 16 09:46 index-1--4816864174458775216.wt
      -rw------- 1 gregory  20K Oct 16 09:46 index-3--4816864174458775216.wt
      -rw------- 1 gregory 4.0K Oct 16 09:45 index-5--4816864174458775216.wt
      -rw------- 1 gregory 4.0K Oct 16 09:45 index-6--4816864174458775216.wt
      -rw------- 1 gregory 4.0K Oct 16 09:47 index-8--4816864174458775216.wt
      drwx------ 2 gregory 4.0K Oct 16 09:45 journal/
      -rw------- 1 gregory  20K Oct 16 09:46 _mdb_catalog.wt
      -rw------- 1 gregory    6 Oct 16 09:45 mongod.lock
      -rw------- 1 gregory 4.0K Oct 16 09:45 sizeStorer.wt
      -rw------- 1 gregory  114 Oct 16 09:45 storage.bson
      -rw------- 1 gregory   47 Oct 16 09:45 WiredTiger
      -rw------- 1 gregory 4.0K Oct 16 09:45 WiredTigerHS.wt
      -rw------- 1 gregory   21 Oct 16 09:45 WiredTiger.lock
      -rw------- 1 gregory 1.3K Oct 16 09:46 WiredTiger.turtle
      -rw------- 1 gregory  32K Oct 16 09:46 WiredTiger.wt
      

      From this we can deduce that the idents belonging to collection test.a are

      • collection-7--4816864174458775216.wt
      • index-8--4816864174458775216.wt (_id index)

      Now we restart the server to generate a new '_rand'.

      ...
      2020-10-16T09:48:24.053-04:00 I  STORAGE  [initandlisten] Opening WiredTiger {"config":"create,cache_size=31663M,session_max=33000,eviction=(threads_min=4,threads_max=4),config_base=false,statistics=(fast),log=(enabled=true,archive=true,path=journal,compressor=snappy),file_manager=(close_idle_time=100000,close_scan_interval=10,close_handle_minimum=250),statistics_log=(wait=0),verbose=[recovery_progress,checkpoint_progress,compact_progress],debug_mode=(table_logging=true,),"}
      ...
      2020-10-16T09:48:25.023-04:00 I  STORAGE  [initandlisten] WiredTiger opened {"durationMillis":970}
      ...
      2020-10-16T09:48:25.027-04:00 I  -        [initandlisten] +++ DurableCatalogImpl::DurableCatalogImpl _rand is 3852150022159100276
      2020-10-16T09:48:25.029-04:00 I  -        [initandlisten] +++ DurableCatalogImpl::_hasEntryCollidingWithRand
      2020-10-16T09:48:25.029-04:00 I  -        [initandlisten] +++ Checking if _rand conflicts with collection-0--4816864174458775216
      2020-10-16T09:48:25.029-04:00 I  -        [initandlisten] +++ Checking if _rand conflicts with collection-2--4816864174458775216
      2020-10-16T09:48:25.030-04:00 I  -        [initandlisten] +++ Checking if _rand conflicts with collection-4--4816864174458775216
      2020-10-16T09:48:25.030-04:00 I  -        [initandlisten] +++ Checking if _rand conflicts with collection-7--4816864174458775216
      ...
      2020-10-16T09:48:25.078-04:00 I  NETWORK  [listener] Waiting for connections {"port":27017,"ssl":"off"}
      

      As we can see, the DurableCatalogImpl::_hasEntryCollidingWithRand() function only checks for ident conflicts on the idents starting with "collection-". The new '_rand' is now 3852150022159100276

      After creating an additional index on collection test.a, the /data/db has the following contents:

      -rw------- 1 gregory  20K Oct 16 09:48 collection-0--4816864174458775216.wt
      -rw------- 1 gregory  36K Oct 16 09:49 collection-2--4816864174458775216.wt
      -rw------- 1 gregory 4.0K Oct 16 09:48 collection-4--4816864174458775216.wt
      -rw------- 1 gregory 4.0K Oct 16 09:49 collection-7--4816864174458775216.wt
      drwx------ 2 gregory 4.0K Oct 16 09:49 diagnostic.data/
      -rw------- 1 gregory 4.0K Oct 16 09:49 index-0-3852150022159100276.wt
      -rw------- 1 gregory  20K Oct 16 09:48 index-1--4816864174458775216.wt
      -rw------- 1 gregory  36K Oct 16 09:49 index-3--4816864174458775216.wt
      -rw------- 1 gregory 4.0K Oct 16 09:48 index-5--4816864174458775216.wt
      -rw------- 1 gregory 4.0K Oct 16 09:49 index-6--4816864174458775216.wt
      -rw------- 1 gregory 4.0K Oct 16 09:48 index-8--4816864174458775216.wt
      drwx------ 2 gregory 4.0K Oct 16 09:48 journal/
      -rw------- 1 gregory  36K Oct 16 09:49 _mdb_catalog.wt
      -rw------- 1 gregory    6 Oct 16 09:48 mongod.lock
      -rw------- 1 gregory  36K Oct 16 09:48 sizeStorer.wt
      -rw------- 1 gregory  114 Oct 16 09:45 storage.bson
      -rw------- 1 gregory   47 Oct 16 09:45 WiredTiger
      -rw------- 1 gregory 4.0K Oct 16 09:48 WiredTigerHS.wt
      -rw------- 1 gregory   21 Oct 16 09:45 WiredTiger.lock
      -rw------- 1 gregory 1.3K Oct 16 09:49 WiredTiger.turtle
      -rw------- 1 gregory  68K Oct 16 09:49 WiredTiger.wt
      

      So now, collection test.a owns the following idents

      • collection-7--4816864174458775216.wt
      • index-8--4816864174458775216.wt (_id index)
      • index-0-3852150022159100276.wt (newly created index)

      Upon restarting the server one last time, we see that DurableCatalogImpl::_hasEntryColldingWithRand() only checks against the idents starting with "collection-"

      ...
      2020-10-16T09:49:52.625-04:00 I  STORAGE  [initandlisten] WiredTiger opened {"durationMillis":1110}
      ...
      2020-10-16T09:49:52.630-04:00 I  -        [initandlisten] +++ DurableCatalogImpl::DurableCatalogImpl _rand is -8673726600649296817
      2020-10-16T09:49:52.632-04:00 I  -        [initandlisten] +++ DurableCatalogImpl::_hasEntryCollidingWithRand
      2020-10-16T09:49:52.632-04:00 I  -        [initandlisten] +++ Checking if _rand conflicts with collection-0--4816864174458775216
      2020-10-16T09:49:52.633-04:00 I  -        [initandlisten] +++ Checking if _rand conflicts with collection-2--4816864174458775216
      2020-10-16T09:49:52.633-04:00 I  -        [initandlisten] +++ Checking if _rand conflicts with collection-4--4816864174458775216
      2020-10-16T09:49:52.633-04:00 I  -        [initandlisten] +++ Checking if _rand conflicts with collection-7--4816864174458775216
      ...
      2020-10-16T09:49:52.694-04:00 I  NETWORK  [listener] Waiting for connections {"port":27017,"ssl":"off"}
      

      Because we're only checking for conflicts against idents starting with "collection-", it's possible for us to re-use '_rand' 3852150022159100276 because of this. Statistically speaking, this has a very low probability to happen.

            Assignee:
            Unassigned Unassigned
            Reporter:
            gregory.wlodarek@mongodb.com Gregory Wlodarek
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

              Created:
              Updated: