Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-8476

slaveDelay with Ghostsync

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major - P3
    • Resolution: Duplicate
    • 2.3.2
    • None
    • Replication
    • CentOS6.2 x86_64
    • Replication
    • ALL

    Description

      Problem

      We met a serious problem of getting STALE DATA.

      This problem comes from slaveDelay and Ghost sync.

      Situation

      Replica set members

      "members" : [
      {  "_id" : 0,
         "host" : "192.168.159.133:27017",
         "priority" : 2
      },{"_id" : 1,
         "host" : "192.168.159.134:27017"
      },{"_id" : 2,
         "host" : "192.168.159.135:27017",
         "priority" : 0,
         "slaveDelay" : 300
      }]
      

      Problem1 : syncFrom

      rs.syncFrom('192.168.159.135:27017')
      {
         "syncFromRequested" : "192.168.159.135:27017",
         "warning" : "requested member is more than 10 seconds behind us",
         "prevSyncTarget" : "192.168.159.133:27017",
         "ok" : 1
      }
      

      I can see this warnings, if we set the miss settings.
      But we won't get this warnings, when this replica set was dull.

      rs.syncFrom('192.168.159.135:27017')
      {
         "syncFromRequested" : "192.168.159.135:27017",
         "prevSyncTarget" : "192.168.159.133:27017",
         "ok" : 1
      }
      

      This problem lead to human error.
      But bearable, because we can avoid it.

      Problem2 : Automatic ghost sync caused by network trouble.

      in 192.168.159.133
      Simulate the network trouble.

      iptables -A INPUT -p tcp --dport 27017 -s 192.168.159.134 -j DROP
      

      Then 192.168.154.134 is still available !!
      192.168.154.134 would change the sync target form primary(192.168.154.133) to slaveDelay secondary(192.168.154.135) and KEEP ALIVE in spite of being delayed !!

      But we (mongo client) cannot realize that 192.168.154.134 is now delayed.
      We think, the node should die (unreachable from client) instead of unexpected delay.
      Then we (client) can read fresh data from primary.

      Attachments

        Issue Links

          Activity

            People

              backlog-server-repl Backlog - Replication Team
              crumbjp Hiroaki
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: