Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-12293

initial sync of a capped collection can often fail if highly transient

    • Replication
    • ALL

      If a capped collection is hot, an initial sync of a new replica set member can often fail because the cursor gets overrun while syncing.
      One possible solution is when detected on a syncing secondary, to stop cloning the collection, and let the oplog sync take care of it.
      Note: if this happens, it could mean the oplog sync will never converge as well.

      – OLD BELOW –

      Any write to a full capped collection deletes old record(s).
      The delete seems to invalidate all cursors on the collection,

      https://github.com/mongodb/mongo/blob/master/src/mongo/db/clientcursor.cpp#L251

      Attempt to initial sync a capped collection with master/latest that's being inserted into:

      2014-01-08T09:00:07.481-0800 [rsSync] 		cloning collection test.cap to test.cap on asyasmacbook.local:40001 with filter {}
      2014-01-08T09:00:07.949-0800 [rsSync] replSet initial sync exception: 13127 getMore: cursor didn't exist on server, possible restart or timeout? 0 attempts remaining
      

      Note, failure is almost instant, unlike in 2.4 where such failure would happen "eventually" if the writes were "fast enough".

      It appears that if the failure does not immediately happen, then the clone succeeds - possible timing interaction issue?

            Assignee:
            backlog-server-repl [DO NOT USE] Backlog - Replication Team
            Reporter:
            asya.kamsky@mongodb.com Asya Kamsky
            Votes:
            6 Vote for this issue
            Watchers:
            20 Start watching this issue

              Created:
              Updated: