Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-37730

Make the index catalog timestamp aware

    • Type: Icon: Task Task
    • Resolution: Won't Do
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: Storage
    • Storage Execution
    • Storage NYC 2019-01-14

      Add 'birth' and 'death' timestamps to in-memory index catalog entries so that we can keep the in-memory index catalog entry objects and know when it is appropriate to use them. Handle standalones, too, they should default to zero timestamp for both 'birth' and 'death'. And built indexes loaded into memory can default to zero for the 'birth' timestamp as well, since PIT reads are not possible to before the index existed.
      -----------
      Delay dropping indexes until the index's death timestamp is checkpointed and older than supported point-in-time reads. Metadata changes in the storage engine act across all timestamps, they are not point-in-time.
      ------------
      listIndexes should not return indexes with set 'death' timestamps.
      ------------
      Two phase drop of indexes
      1st Phase:

      • keep in-memory, set death timestamp
      • remove entry from __mdb_catalog
      • keep index table in WT
      • schedule table delete when death timestamp is older than both oldest_timestamp last checkpointed timestamp – see RTT explanation below for why.

      2nd Phase (when death timestamp is old enough):

      • Remove in-memory entry
      • Delete table.

      PIT accesses will see PIT version of __mdb_catalog that still has the index entry if the in-memory index entry allows index access at that PIT. Then the table still exists at that PIT and everything works.

      Rollback via refetch works:

      • unschedule index table delete
      • rewrite __mdb_catalog entry
      • unset in-memory death timestamp

      RTT works:

      • The in-memory catalog has been closed, so in-memory entries are of no concern
      • the __mdb_catalog entry will be recovered if appropriate on recovery to stable
      • if the __mdb_catalog entry was recovered, oplog application will reapply dropIndexes if it is not rolled back
      • easier to drop index table after dropIndexes is in the checkpoint, and will never be undone on rollback. Otherwise, the index table could be discarded, before dropIndexes is checkpointed, then recovery to the checkpoint will recreate the index catalog entry in __mdb_catalog and our hook to rebuild the index table will run – it'd be nicer if the index table wasn't dropped in the first place.

      Question: does something bad happen if we checkpoint a timestamp that is newer than oldest timestamp? I imagine WT handles that situation, maybe checkpoints further back in time, earlier than oldest_timestamp. In case oldest_timestamp ever was set that far back in time, since the calculation doesn't currently refer to the checkpointed timestamp at all.

            Assignee:
            backlog-server-execution [DO NOT USE] Backlog - Storage Execution Team
            Reporter:
            dianna.hohensee@mongodb.com Dianna Hohensee (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

              Created:
              Updated:
              Resolved: