Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-27009

Replication initial sync creates cursors with no timeout

    XMLWordPrintable

    Details

    • Operating System:
      ALL

      Description

      Both the cloner and oplog fetcher in replication initial sync use a cursor with no timeout:

      2016-11-03T19:58:56.081+0000 I COMMAND  [conn47601] command buildlogs.logs command: find { find: "logs", noCursorTimeout: true, batchSize: 13981010 } planSummary: COLLSCAN cursorid:45904553724 keysExamined:0 docsExamined:822 numYields:14 nreturned:821 reslen:16750452 locks:{ Global: { acquireCount: { r: 30 } }, Database: { acquireCount: { r: 15 } }, Collection: { acquireCount: { r: 15 } } } protocol:op_command 447ms
      

      While both these components have graceful shutdown and clean up the cursors that they open, in case of network failure or crash of a secondary node, these cursors will be leaked and never get cleaned up.

      This is especially problematic with replica set shards, because having a cursor open on a sharded collection will eventually block migrations to that shard:

      2016-11-09T16:09:06.572+0000 I SHARDING [RangeDeleter] waiting for open cursors before removing range [{ build_id: "337bc5b6432ea606a010e4c95a5e5f9a", test_id: ObjectId('57f3eb919041302d8b03ffdf'), seq: 1 }, { build_id: "337c88bdf0f88e7c95d9ba482d042e71", test_id: ObjectId('57d1b969be07c42b9805e57f'), seq: 2 }) in buildlogs.logs, elapsed secs: 499819, cursor ids: [45904553724]
      

        Attachments

          Issue Links

            Activity

              People

              • Votes:
                1 Vote for this issue
                Watchers:
                6 Start watching this issue

                Dates

                • Created:
                  Updated: