Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-12293

initial sync of a capped collection can often fail if highly transient

    XMLWordPrintable

    Details

    • Operating System:
      ALL

      Description

      If a capped collection is hot, an initial sync of a new replica set member can often fail because the cursor gets overrun while syncing.
      One possible solution is when detected on a syncing secondary, to stop cloning the collection, and let the oplog sync take care of it.
      Note: if this happens, it could mean the oplog sync will never converge as well.

      – OLD BELOW –

      Any write to a full capped collection deletes old record(s).
      The delete seems to invalidate all cursors on the collection,

      https://github.com/mongodb/mongo/blob/master/src/mongo/db/clientcursor.cpp#L251

      Attempt to initial sync a capped collection with master/latest that's being inserted into:

      2014-01-08T09:00:07.481-0800 [rsSync] 		cloning collection test.cap to test.cap on asyasmacbook.local:40001 with filter {}
      2014-01-08T09:00:07.949-0800 [rsSync] replSet initial sync exception: 13127 getMore: cursor didn't exist on server, possible restart or timeout? 0 attempts remaining

      Note, failure is almost instant, unlike in 2.4 where such failure would happen "eventually" if the writes were "fast enough".

      It appears that if the failure does not immediately happen, then the clone succeeds - possible timing interaction issue?

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              backlog-server-repl Backlog - Replication Team
              Reporter:
              asya Asya Kamsky
              Participants:
              Votes:
              6 Vote for this issue
              Watchers:
              17 Start watching this issue

                Dates

                Created:
                Updated: