Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-61262

5.0/5.1 binary might receive tenant migration state document of 5.2 FCV format, leading to crash.

    • Type: Icon: Task Task
    • Resolution: Gone away
    • Priority: Icon: Minor - P4 Minor - P4
    • None
    • Affects Version/s: None
    • Component/s: None
    • None
    • Serverless

      Here is the scenario that I am thinking
      (Assume currently the recipient (R) replica set is running 5.2 Binary and FCV 5.2)
      1) R primary receives recipientSyncData cmd with protocol as 'Merge'.
      2) R POS instance started and have persisted the initial state doc with 'Merge' protocol'.
      3) Now, R primary receives 'setFeatureCompatibilityVersion' cmd to downgrade to 5.0.
      4) R primary goes to FCV downgrading state.
      5) FCV code Signals all active tenant migrations to abort (but it doesn't wait for it to get aborted or the state doc to mark as garbage collect)
      6) R primary successfully able to downgrade to 5.0
      7) Now, R POS instance receives the abort signal and aborts the current tenant migration before we persist the 'RecipientPrimaryStartingFCV' info in the state doc (and before compare D (donor) & R FCV check).
      8) R primary steps down.
      9) At this point, we have a recipient tenant migration state doc on-disk in the which is not marked as garbage collected. So, we consider the migration as active and can resume on new-primary.

      Since the replica set is already downgraded to 5.0. We are free to replace the recipient binaries from 5.2 to 5.0. Now if new primary steps up is in 5.0 binary, a pos instance will be started for the state doc w/ 5.2 on 5.0 binary.

            Assignee:
            backlog-server-serverless [DO NOT USE] Backlog - Server Serverless (Inactive)
            Reporter:
            suganthi.mani@mongodb.com Suganthi Mani
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: