Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-27280

repl failpoints that block until they're turned off should check for shutdown at intervals

    • Type: Icon: Bug Bug
    • Resolution: Won't Fix
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: 3.4.0
    • Component/s: Replication
    • Replication
    • ALL

      The following places use the

      if (failpoint) {
          while (failpoint) {
              sleepsecs(1)
          }
      }
      

      pattern, without checking for shutdown inside the while:

      SyncTail::getMissingDoc()
      https://github.com/mongodb/mongo/blob/d9d8ac1e8ae85a07369274eaf12c0ce555e08b2e/src/mongo/db/repl/sync_tail.cpp#L948-L954

      rs_rollback.cpp::_syncRollback()
      https://github.com/mongodb/mongo/blob/d9d8ac1e8ae85a07369274eaf12c0ce555e08b2e/src/mongo/db/repl/rs_rollback.cpp#L912-L919

      CollectionCloner::_insertDocumentsCallback
      https://github.com/mongodb/mongo/blob/d9d8ac1e8ae85a07369274eaf12c0ce555e08b2e/src/mongo/db/repl/collection_cloner.cpp#L496-L506

      rs_initialsync.cpp::_initialSync
      https://github.com/mongodb/mongo/blob/d9d8ac1e8ae85a07369274eaf12c0ce555e08b2e/src/mongo/db/repl/rs_initialsync.cpp#L310-L316

      Cloner()
      https://github.com/mongodb/mongo/blob/d9d8ac1e8ae85a07369274eaf12c0ce555e08b2e/src/mongo/db/cloner.cpp#L285-L295

      If some test turns on the failpoint but forgets to turn it off, a node that hits the failpoint will hang in shutdown, causing failures like the ones addressed by SERVER-26928.

      They should all be changed to check for being in shutdown (possibly through the inShutdown() function).

            Assignee:
            backlog-server-repl [DO NOT USE] Backlog - Replication Team
            Reporter:
            esha.maharishi@mongodb.com Esha Maharishi (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved: