Change streams may be subject to spurious "CappedPositionLost" when resuming

XMLWordPrintableJSON

    • Type: Bug
    • Resolution: Gone away
    • Priority: Major - P3
    • None
    • Affects Version/s: None
    • None
    • ALL
    • Query 2020-08-24, Query 2020-09-07
    • None
    • 3
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Our testing infrastructure uncovered a rare case where this might happen, detailed in SERVER-49690. When a change stream is resuming, it may encounter this error. As far as I know this has never been observed, but I see no reason it couldn't happen. I would recommend looking into whether we can reproduce this. If so, I think we should do one of the following:
      1) Disabling yielding when doing the oplog check upon resume during the change stream
      2) Adding a similar retry loop within the change stream
      3) Ensuring drivers will retry this error

      During SERVER-49690 I looked into option #1 but the patch quickly exploded. I'll attach my WIP but it certainly won't compile and doesn't plumb the yield policy far enough to fix the issue.

              Assignee:
              Bernard Gorman
              Reporter:
              Charlie Swanson
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated:
                Resolved: