Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-18190

Secondary reads may block replication

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Critical - P2
    • Resolution: Fixed
    • Affects Version/s: 3.0.2
    • Fix Version/s: 3.0.4, 3.1.3
    • Component/s: Concurrency, Querying
    • Labels:
    • Backwards Compatibility:
      Fully Compatible
    • Operating System:
      ALL
    • Backport Completed:
    • Sprint:
      Quint Iteration 3

      Description

      Issue Status as of Jun 09, 2015

      ISSUE SUMMARY
      Reading from secondary nodes in a replica set may block the application of replication write operations, because longer read operations may not yield appropriately.

      USER IMPACT
      High volume read operations on secondary nodes may cause the nodes to experience increased replication lag, which may make read operations return old data.

      In extreme cases the affected node may become "stale". Stale nodes need to be resynchronized. If enough nodes in a replica set become stale availability may be impacted.

      WORKAROUNDS
      The preferred workaround is to suspend all read operations on secondary nodes.

      Alternatively, the oplog size can be increased on secondary nodes. This is only a suitable workaround if the nodes undergo periods of no reads so replication can catch up.

      AFFECTED VERSIONS
      MongoDB 3.0.0 through 3.0.3.

      FIX VERSION
      The fix is included in the 3.0.4 production release.

      Original description

      • 3 table scans each taking 5-10 seconds (and returning no results) were done on a collection of about 12M documents on the secondary, marked A-B, C-D, E-F above. At the same time documents were inserted into the same collection on the primary, driving replication traffic.
      • During the table scans replication rate falls to 0, replication lag builds.
      • Graphs show straight lines between the beginning and end of the stalls, indicating that the serverStatus command that the data collection depends on was blocked as well.
      • Primary is not similarly affected by the same table scan.
      • Problem reproduces on both WiredTiger and mmapv1
      1. gdbmon.html
        1.20 MB
        Bruce Lucas
      1. secondary_reads.png
        106 kB

        Issue Links

          Activity

          Hide
          ramon.fernandez Ramon Fernandez added a comment -

          Matjaž Čuk, we're currently working on 3.0.3. Once we have a timeframe for 3.0.4 we'll update the JIRA versions page.

          Show
          ramon.fernandez Ramon Fernandez added a comment - Matjaž Čuk , we're currently working on 3.0.3. Once we have a timeframe for 3.0.4 we'll update the JIRA versions page .
          Hide
          xgen-internal-githook Githook User added a comment -

          Author:

          {u'username': u'GeertBosch', u'name': u'Geert Bosch', u'email': u'geert@mongodb.com'}

          Message: SERVER-18190: Make ParallelBatchWriterMode use a LockManager managed lock

          (cherry picked from commit 465ba933e8d6f5ad9173c4c806686b915bfffe1c)

          Conflicts:
          src/mongo/db/concurrency/lock_state.cpp
          src/mongo/db/stats/fill_locker_info.cpp
          src/mongo/db/stats/fill_locker_info.h
          Branch: v3.0
          https://github.com/mongodb/mongo/commit/1a4f1719af7b4959564df7c22d72ec03f3938a91

          Show
          xgen-internal-githook Githook User added a comment - Author: {u'username': u'GeertBosch', u'name': u'Geert Bosch', u'email': u'geert@mongodb.com'} Message: SERVER-18190 : Make ParallelBatchWriterMode use a LockManager managed lock (cherry picked from commit 465ba933e8d6f5ad9173c4c806686b915bfffe1c) Conflicts: src/mongo/db/concurrency/lock_state.cpp src/mongo/db/stats/fill_locker_info.cpp src/mongo/db/stats/fill_locker_info.h Branch: v3.0 https://github.com/mongodb/mongo/commit/1a4f1719af7b4959564df7c22d72ec03f3938a91
          Hide
          m.cuk Matjaž Čuk added a comment -

          So the JIRA versions page says :
          3.0.4 09/Jun/15 Stable

          Today is 11/Jun/15 and under downloads there is still only 3.0.3.

          Show
          m.cuk Matjaž Čuk added a comment - So the JIRA versions page says : 3.0.4 09/Jun/15 Stable Today is 11/Jun/15 and under downloads there is still only 3.0.3.
          Hide
          bruce.lucas Bruce Lucas added a comment -

          A release candidate 3.0.4-rc0 is available for testing (only) in the "development releases" section of the download site. It is not ready for production use yet, but if this release candidate passes our tests it will become the production 3.0.4 release.

          Show
          bruce.lucas Bruce Lucas added a comment - A release candidate 3.0.4-rc0 is available for testing (only) in the "development releases" section of the download site. It is not ready for production use yet, but if this release candidate passes our tests it will become the production 3.0.4 release.
          Hide
          ramon.fernandez Ramon Fernandez added a comment -

          Matjaž Čuk, apologies for the inaccuracies, I'll update JIRA. 3.0.4 was delayed about a week, but the 3.0.4-rc0 release candidate contains a fix for this issue and is available for download. If you were affected by this bug it would be very helpful if you could try 3.0.4-rc0 out and confirm that your problem is indeed fixed.

          Thanks,
          Ramón.

          Show
          ramon.fernandez Ramon Fernandez added a comment - Matjaž Čuk , apologies for the inaccuracies, I'll update JIRA. 3.0.4 was delayed about a week, but the 3.0.4-rc0 release candidate contains a fix for this issue and is available for download . If you were affected by this bug it would be very helpful if you could try 3.0.4-rc0 out and confirm that your problem is indeed fixed. Thanks, Ramón.

            People

            • Votes:
              2 Vote for this issue
              Watchers:
              30 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                  Agile