Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-8476

slaveDelay with Ghostsync

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Duplicate
    • Affects Version/s: 2.3.2
    • Fix Version/s: None
    • Component/s: Replication
    • Labels:
    • Environment:
      CentOS6.2 x86_64
    • Operating System:
      ALL

      Description

      Problem

      We met a serious problem of getting STALE DATA.

      This problem comes from slaveDelay and Ghost sync.

      Situation

      Replica set members

      "members" : [
      {  "_id" : 0,
         "host" : "192.168.159.133:27017",
         "priority" : 2
      },{"_id" : 1,
         "host" : "192.168.159.134:27017"
      },{"_id" : 2,
         "host" : "192.168.159.135:27017",
         "priority" : 0,
         "slaveDelay" : 300
      }]
      

      Problem1 : syncFrom

      rs.syncFrom('192.168.159.135:27017')
      {
         "syncFromRequested" : "192.168.159.135:27017",
         "warning" : "requested member is more than 10 seconds behind us",
         "prevSyncTarget" : "192.168.159.133:27017",
         "ok" : 1
      }
      

      I can see this warnings, if we set the miss settings.
      But we won't get this warnings, when this replica set was dull.

      rs.syncFrom('192.168.159.135:27017')
      {
         "syncFromRequested" : "192.168.159.135:27017",
         "prevSyncTarget" : "192.168.159.133:27017",
         "ok" : 1
      }
      

      This problem lead to human error.
      But bearable, because we can avoid it.

      Problem2 : Automatic ghost sync caused by network trouble.

      in 192.168.159.133
      Simulate the network trouble.

      iptables -A INPUT -p tcp --dport 27017 -s 192.168.159.134 -j DROP
      

      Then 192.168.154.134 is still available !!
      192.168.154.134 would change the sync target form primary(192.168.154.133) to slaveDelay secondary(192.168.154.135) and KEEP ALIVE in spite of being delayed !!

      But we (mongo client) cannot realize that 192.168.154.134 is now delayed.
      We think, the node should die (unreachable from client) instead of unexpected delay.
      Then we (client) can read fresh data from primary.

        Attachments

          Issue Links

            Activity

              People

              • Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: