Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-12109

getMore with tailable cursor, projection, and Query_OplogReplay may fail to return new data

    XMLWordPrintableJSON

Details

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Major - P3 Major - P3
    • 2.5.5
    • 2.5.4
    • Replication
    • None
    • ALL
    • Hide
      1. Create a two node replica set with a 2.4 secondary syncing from a 2.6 primary.
      2. Send a reconfig that only contains changes to member tags.
      3. Insert something and do a getLastError() check with w:2.
      4. See that it takes 10 mins to get a response (unless you have a timeout specified, but in that case, if you do another getLastError(), it will not work until it has been 10 mins).
      Show
      Create a two node replica set with a 2.4 secondary syncing from a 2.6 primary. Send a reconfig that only contains changes to member tags. Insert something and do a getLastError() check with w:2. See that it takes 10 mins to get a response (unless you have a timeout specified, but in that case, if you do another getLastError(), it will not work until it has been 10 mins).

    Description

      The logic behind this is as follows:

      In 2.4, the oplogreader which notifies the primary of a secondary's sync progress only sends a handshake when it first connects (in 2.6, we notify the primary of this progress via the SyncSourceFeedback class). This handshake is how the secondary gets added to primary's ghost cache which is how the primary tracks the secondary's sync progress for the sake of write concerns.

      On a reconfig where only tags are affected, 2.6 members clear the ghost cache as well as the member list, but do not close all connections in order to avoid triggering an election. When a 2.6 node is syncing from a 2.6 node, this does not cause a problem because the 2.6 secondary node will send an update, hear back that primary does not know who the secondary is, and the secondary will send a handshake.

      After the reconfig, the 2.6 primary does not know who the 2.4 secondary is, but the 2.4 secondary does not send a new handshake. So, the secondary will continue to send updates and the primary will ignore them. After 10 minutes the oplogreader timeout is trigger and a reconnect occurs.

      Attachments

        Activity

          People

            matt.dannenberg Matt Dannenberg
            matt.dannenberg Matt Dannenberg
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: