Downgrade to FCV 8.0 can leave uncleaned orphan documents

XMLWordPrintableJSON

    • Type: Bug
    • Resolution: Unresolved
    • Priority: Major - P3
    • None
    • Affects Version/s: 8.3.0-rc0, 8.2.0
    • Component/s: None
    • None
    • Cluster Scalability
    • ALL
    • 馃煩 Routing and Topology
    • None
    • None
    • None
    • None
    • None
    • None

      When downgrading the FCV to 8.0 we abort ongoing chunk migrations. Just like SERVER-92484, if the chunk migration gets aborted after it was committed but before the corresponding range deletion got deleted, the range deletion will remain in "pending:true" status so its orphans won't get cleaned up until the migration is lazily recovered (either on the next query on that namespace (if any) or the next step-up).

      SERVER-103749 implemented a workaround for CheckOrphansDeleted getting stuck by forcing migration recovery in jstests but it doesn't always work: The migration coordinator document is on the donor shard, but the pending range deletion is also on the recipient shard, which may get checked first.

      We may want to review the FCV downgrade code or the workaround to avoid this issue.

      A reproducer is attached.

            Assignee:
            Jack Mulrow
            Reporter:
            Joan Bruguera Mic贸
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated: