Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-12727

index building can make replica set member unreachable / unresponsive

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Gone away
    • Affects Version/s: 2.4.9
    • Fix Version/s: None
    • Component/s: Replication
    • Labels:
      None
    • Operating System:
      ALL
    • Steps To Reproduce:
      Hide

      We have a 3-member replica set with no arbiters. We built an index on a large collection( ~40GB, 17M docs) with background=True. This seemed to work okay on primary but 30 mins later (when the secondaries were both told to build the index) our replica set went down as they became entirely unresponsive and were unable to vote.

      Show
      We have a 3-member replica set with no arbiters. We built an index on a large collection( ~40GB, 17M docs) with background=True. This seemed to work okay on primary but 30 mins later (when the secondaries were both told to build the index) our replica set went down as they became entirely unresponsive and were unable to vote.

      Description

      There is already an issue relating to the behaviour of background indexes on secondaries listed as FIXED for 2.5
      https://jira.mongodb.org/browse/SERVER-2771
      It is not entirely clear however how this issue has been fixed. Do the indices get built in background on secondaries similarly to the primary and/or is the building of indices done sequentially rather than synchronously accross all secondaries. It would be good to have clarification on this.

      Separate from this issue though I believe is the behaviour of the secondaries whilst building foreground indices is not entirely acceptable. It is fine that database is locked but the member shouldn't become entirely unresponsive for the time it takes to build the index.

        Issue Links

          Activity

          Hide
          johng John Greenall added a comment -

          I don't want to try to recreate on our live server and to set up a toy replica set from snapshots of our live data is probably an hour or twos' work I'd rather not spend right now.

          We do plan to move to 2.6.0 however as soon as stable release is available and I will probably have another replica set up for testing at that point. This would be a natural time for me to try to recreate the problem, particularly if there is a chance the issue has already been fixed by work done on related issues. Can we leave this issue open until then?

          Show
          johng John Greenall added a comment - I don't want to try to recreate on our live server and to set up a toy replica set from snapshots of our live data is probably an hour or twos' work I'd rather not spend right now. We do plan to move to 2.6.0 however as soon as stable release is available and I will probably have another replica set up for testing at that point. This would be a natural time for me to try to recreate the problem, particularly if there is a chance the issue has already been fixed by work done on related issues. Can we leave this issue open until then?
          Hide
          matt.dannenberg Matt Dannenberg (Inactive) added a comment -

          Sure, that plan sounds good.

          Show
          matt.dannenberg Matt Dannenberg (Inactive) added a comment - Sure, that plan sounds good.
          Hide
          matt.dannenberg Matt Dannenberg (Inactive) added a comment -

          Hey John,

          Have you upgraded to 2.6.0? If so, were you able to reproduce the problem?

          Thanks,
          Matt

          Show
          matt.dannenberg Matt Dannenberg (Inactive) added a comment - Hey John, Have you upgraded to 2.6.0? If so, were you able to reproduce the problem? Thanks, Matt
          Hide
          thomasr Thomas Rueckstiess added a comment -

          Hi John,

          We haven't heard back from you in some time. As we are unable to reproduce the issue without further information, I'll go ahead and resolve the ticket now. If this is still an issue after you had the chance to upgrade your environment to 2.6 and you'd like to follow up, feel free to re-open the ticket and provide further details.

          Regards,
          Thomas

          Show
          thomasr Thomas Rueckstiess added a comment - Hi John, We haven't heard back from you in some time. As we are unable to reproduce the issue without further information, I'll go ahead and resolve the ticket now. If this is still an issue after you had the chance to upgrade your environment to 2.6 and you'd like to follow up, feel free to re-open the ticket and provide further details. Regards, Thomas
          Hide
          johng John Greenall added a comment -

          @Thomas I hadn't forgotten about this log but have not yet set up a replica set on 2.6 since my 2.6.0 test server fell over on the first set of unit tests I ran (array updates). Now waiting for release of 2.6.1... Will re-raise this log if the issue persists.

          Show
          johng John Greenall added a comment - @Thomas I hadn't forgotten about this log but have not yet set up a replica set on 2.6 since my 2.6.0 test server fell over on the first set of unit tests I ran (array updates). Now waiting for release of 2.6.1... Will re-raise this log if the issue persists.

            People

            • Votes:
              2 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: