Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-54902

Time-series collection creation commits the view definition and associated oplog entry at different times

    • Type: Icon: Bug Bug
    • Resolution: Fixed
    • Priority: Icon: Major - P3 Major - P3
    • 5.0.0-rc0
    • Affects Version/s: None
    • Component/s: Storage
    • None
    • Fully Compatible
    • ALL
    • Execution Team 2021-04-05, Execution Team 2021-04-19
    • 22

      Time-series collection creation performs two operations under the hood:

      1. Create the buckets collection.
      2. Create the view definition on the buckets collection.

      This all happens under one WriteUnitOfWork here. In that WUOW, there's an issue with the view definition creation part. Below are the order of operations for this WUOW to give a better idea.

      1. start_transaction
      2. setTimestamp(1)   <--- https://github.com/mongodb/mongo/blob/760ecda9059f3043498c2cd5106c8307344f756e/src/mongo/db/catalog/database_impl.cpp#L681 --->
      3. createCollection (the buckets collection)
      4. generate oplog entry via onCreateCollection at Timestamp(1)
      5. Insert the view definition into the 'system.views' collection      <--- untimestamped --->
      6. generate oplog entry via onInserts at Timestamp(2)
      7. commit_transaction
      

      Because the insert statement at (5) wasn't timestamped, it uses the last set commit timestamp, Timestamp(1) in this case. When the oplog entry was generated at (6), the oplog reserved the next available slot for the entry at Timestamp(2).

      This means we committed the view definition at Timestamp(1) and its associated oplog entry at Timestamp(2).

       

       

      A concurrency workload detected this issue when dbHash failed with a mismatch.
      The oplog entries for the concurrency workload were the following:

      1. T227 drop the view definition for the bucket (create_timeseries_collection_fsmcoll0_0)
      2. T228 drop bucket (create_timeseries_collection_fsmcoll0_0)
      3. T229 create bucket (create_timeseries_collection_fsmcoll0_0)
      4. T230-T234 operations for another thread in the workload run   <--- this gap is important to detect the dbHash mismatch --->
      5. T235 create the view definition for the bucket (create_timeseries_collection_fsmcoll0_0)
      

      dbHash ran and opened a storage snapshot at T230 while the view definition should not have existed yet, but it did on the primary node. The secondary had the correct state though. Because of the issue described in this ticket, the view definition came into life at T229.

      Primary running dbHash:
      [ReplicaSetFixture:job0:primary] {"t":{"$date":"2021-03-02T20:47:09.435+00:00"},"s":"I",  "c":"COMMAND",  "id":0,       "ctx":"conn59","msg":"dbHash snapshot","attr":{"snapshot":"Timestamp(1614718029, 230)"}}
      [ReplicaSetFixture:job0:primary] {"t":{"$date":"2021-03-02T20:47:09.435+00:00"},"s":"I",  "c":"COMMAND",  "id":0,       "ctx":"conn59","msg":"dbHash saw","attr":{"doc":{"_id":"test0_fsmdb0.create_timeseries_collection_fsmcoll0_0","viewOn":"system.buckets.create_timeseries_collection_fsmcoll0_0","pipeline":[{"$_internalUnpackBucket":{"timeField":"time","exclude":[]}}],"timeseries":{"timeField":"time"}}}}
      [ReplicaSetFixture:job0:primary] {"t":{"$date":"2021-03-02T20:47:09.435+00:00"},"s":"I",  "c":"COMMAND",  "id":0,       "ctx":"conn59","msg":"dbHash saw","attr":{"doc":{"_id":"test0_fsmdb0.create_timeseries_collection_fsmcoll0_1","viewOn":"system.buckets.create_timeseries_collection_fsmcoll0_1","pipeline":[{"$_internalUnpackBucket":{"timeField":"time","exclude":[]}}],"timeseries":{"timeField":"time"}}}}
      
      Secondary running dbHash:
      [ReplicaSetFixture:job0:secondary] {"t":{"$date":"2021-03-02T20:47:09.436+00:00"},"s":"I",  "c":"COMMAND",  "id":0,       "ctx":"conn30","msg":"dbHash snapshot","attr":{"snapshot":"Timestamp(1614718029, 230)"}}
      [ReplicaSetFixture:job0:secondary] {"t":{"$date":"2021-03-02T20:47:09.436+00:00"},"s":"I",  "c":"COMMAND",  "id":0,       "ctx":"conn30","msg":"dbHash saw","attr":{"doc":{"_id":"test0_fsmdb0.create_timeseries_collection_fsmcoll0_1","viewOn":"system.buckets.create_timeseries_collection_fsmcoll0_1","pipeline":[{"$_internalUnpackBucket":{"timeField":"time","exclude":[]}}],"timeseries":{"timeField":"time"}}}}
      

            Assignee:
            henrik.edin@mongodb.com Henrik Edin
            Reporter:
            gregory.wlodarek@mongodb.com Gregory Wlodarek
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

              Created:
              Updated:
              Resolved: