Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-34702

Whole-DB or whole-cluster change streams may not provide a total ordering if resumed after a drop AND resume token does not include the shard key

    • Query Execution
    • ALL

      This can happen in two separate scenarios.

      Scenario 1

      1. A whole-db or whole-cluster change stream is opened.
      2. An insert notification is returned, which does not yet include a shard key because the collection has not been sharded yet.
      3. The collection from the insert is sharded.
      4. A chunk migrates to another shard.
      5. Two inserts happen with the same cluster time and the same _id, one on each shard.
      6. The collection is dropped.
      7. The change stream is resumed, using the resume token from the first insert, at step 2.

      Scenario 2

      1. A whole-db or whole-cluster change stream is opened, but at least one shard which has a chunk for collection with UUID 'x' does not yet return any results for 'x'.
      2. No updates or deletes happen on that shard, but an insert on 'x' happens on that shard and is not yet returned from the stream.
      3. Collection with UUID 'x' is dropped, causing us to lose information about the shard key.
      4. The insert notification is returned, and will not contain the shard key.
      5. Two inserts happen with the same cluster time and the same _id (this can only happen on two different shards).
      6. The change stream is resumed, using the resume token from the first insert, at step 2.

      In any scenario when the stream is resumed no shard will know the shard key because the collection has been dropped, and the resume token will not include it. Because of this, each shard will produce notifications for the two matching inserts with identical resume tokens. Both will have the same timestamp, same UUID, and same _id.

      The impact of this lack of ordering is that if the client then tried to resume using one of those tokens, they would potentially get the same notification twice or skip one of the inserts.

      Once opened, a change stream will learn the shard key from any oplog entry which contains it. So if there were an update or a delete on the same collection, the change stream would learn the shard key for that collection and avoid this bug. Insert notifications do not include enough information to figure out the shard key. This means that in order to possibly be affected by this issue, the following must be true:

      To be impacted by scenario 1

      • You must be resuming with a resume token generated before the change stream detected the collection has been sharded. This must be before the first chunk migration.
      • There must have been at least one chunk migration.
      • There must have been at least two inserts with the same cluster time, and the same _id.
      • There must not have been any updates or deletes that happened after the chunk migration but before the insert.
      • The client must try to resume using the resume token from one of the inserts with a duplicate resume token.
      • The collection has been dropped at the time of both resumes.

      To be impacted by scenario 2

      • An insert notification must be the first change seen in the stream for a collection 'x' on at least one shard.
      • The collection must have been dropped when the change stream observes that insert.
      • There must be a change on another shard with the same cluster time as that insert.
      • The client must try to resume using the resume token from one of the changes with a duplicate cluster time.

            Assignee:
            backlog-query-execution [DO NOT USE] Backlog - Query Execution
            Reporter:
            charlie.swanson@mongodb.com Charlie Swanson
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated:
              Resolved: