Uploaded image for project: 'WiredTiger'
  1. WiredTiger
  2. WT-9503

Ensure tiered objects are correctly synced to disk after switch

    • Type: Icon: Bug Bug
    • Resolution: Done
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • Labels:

      Make sure WiredTiger fsyncs all elements of tiered tables that might contain part updates that are part of a checkpoint.

      Checkpoints require that all data that is part of the checkpoint is on persistent storage. This ensures that WiredTiger can restart after failures from a known consistent state (i.e., the checkpoint).

      To accomplish this, the checkpoint code calls fsync as the final step in checkpointing a BTree.

      In tiered storage, the changes that belong to a checkpoint might be in more than one file. We need to make sure that __wt_checkpoint_sync() correctly flushes all of them.

      Here's the specific scenario we need to worry about:

      1. A tiered table, foo has foo-001.wtobj as it's local file.
      2. Checkpoint happens.  Data is written to foo-001.wtobj. WT calls fsync(). All changes for the checkpoint are in foo-001.wtobj.
      3. Eviction happens.  More blocks are written to foo-001.wtobj.
      4. flush_tier is called and WT switches files in the table foo, creating foo-002.wtobj as the new local file and scheduling foo-001.wtobj to be written to object storage in the future.
      5. Another checkpoint happens. The pages evicted to foo-001.wtobj are part of this checkpoint. Therefore we must sync that file as well as the active local file to ensure that the checkpoint is complete and recoverable. 

      Currently WT only flushes the active local file during checkpoint. So in step #5, above, we flush foo-002.wtobj but not foo-001.wtobj, leading to possible data loss if we crash after the checkpoint completes.

            Assignee:
            sue.loverso@mongodb.com Susan LoVerso
            Reporter:
            keith.smith@mongodb.com Keith Smith
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: