Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-18430

SECONDARY can not catch up oplog in upgrade progess

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Done
    • Affects Version/s: 2.6.9
    • Fix Version/s: None
    • Component/s: Admin, Replication
    • Labels:
      None
    • Operating System:
      ALL

      Description

      Hi, I tried to upgrade my 2.4.11/12 mixed sharded cluster to 2.6.9, but failed as SECONDARY( hidden, priority=0 ) can not catch up oplog.

      CLUSTER STRUCTURE:
      3 SHARDS, each mongodb instances listened on port 27018
      3 configservers, each config server listened on port 27019
      lots mongos deployed on app servers and listened on port 27020

      PREPAREDNESS:
      As guide http://docs.mongodb.org/manual/release-notes/2.6-upgrade/#preparedness said, we did some checks and found some problems descirbed on https://jira.mongodb.org/browse/SERVER-17746 .
      And we added args --setParameter failIndexKeyTooLong=false to avoid write failure , and ignored other errors .

      UPGRADE PROGESS:
      As guide http://docs.mongodb.org/manual/release-notes/2.6-upgrade/#upgrade-a-sharded-cluster-to-2-6 said.
      1, disabled write operations to metadata, done with no errors
      2, disabled balancer, done with no errors
      3, upgrade metad data, done with no errors
      4, upgrade all mongos instanced to 2.6.9, done with no errors
      5, stopped all 3 ARBITER role of each SHARD ( 3 in total ), as ARIBITER and mongoconfig running on the same box and intall mongodb-org-server(2.6.9) will remove package mongodb-10gen(2.4.x).
      6, upgrade all 3 mongoconfig servers in reverse sequence of

      leaving the first system in the mongos --configdb argument to upgrade last.

      done with no errors.
      7, picked a HIDDEN MEMBER of a SHARD, tried to upgrade and failed as SECONARY can not catch up oplog.
      !https://dn-snitch.qbox.me/FniYFGNRRbaKW_g9eW4UV9W5YMIn?&e=1431423576&token=bbdM74CLuWzsgblM-J_FvC6N06iZVRNR7StZcXKG:An8_3bLbPdbM7JekiNTl-IBkktw=) !
      and other PRIMARY and SECONDARY roles shows:

      Tue May 12 02:27:51.581 [rsHealthPoll] replset info mgsh43.avoscloud.com:27018 thinks that we are down
      

      look like this member can not contact with other members.

      log of this memeber

      1, lots of :
      2015-05-12T02:26:32.025+0800 [IndexRebuilder] opening db {db names}.
      2, lots of:
      2015-05-12T02:34:53.153+0800 [rsHealthPoll] 0x1219651 0x11ba9d9 0x119d71e 0x11a8c6b 0xe4a6a4 0x11a8a1e 0x11a0882 0x125e419 0x7fbeff525182 0x7fbefe82a47d 
       /usr/bin/mongod(_ZN5mongo15printStackTraceERSo+0x21) [0x1219651]
       /usr/bin/mongod(_ZN5mongo10logContextEPKc+0x159) [0x11ba9d9]
       /usr/bin/mongod(_ZN5mongo9wassertedEPKcS1_j+0x16e) [0x119d71e]
       /usr/bin/mongod(_ZN5mongo4task6Server4sendEN5boost8functionIFvvEEE+0x19b) [0x11a8c6b]
       /usr/bin/mongod(_ZN5mongo21ReplSetHealthPollTask6doWorkEv+0x544) [0xe4a6a4]
       /usr/bin/mongod(_ZN5mongo4task4Task3runEv+0x1e) [0x11a8a1e]
       /usr/bin/mongod(_ZN5mongo13BackgroundJob7jobBodyEv+0xd2) [0x11a0882]
       /usr/bin/mongod() [0x125e419]
       /lib/x86_64-linux-gnu/libpthread.so.0(+0x8182) [0x7fbeff525182]
       /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7fbefe82a47d]
      

      I've attached the full log and filter log of this member, and mgcfg3.avoscloud.com(10.10.17.10) in the log is ARBITER.

      and mongodb-org-server:
      db version v2.6.9
      git version: df313bc75aa94d192330cb92756fc486ea604e64

      I don't know why this happen. May be it is a bug of 2.6.9?

        Attachments

        1. mongodb.log.2
          883 kB
        2. mongodb.log.2.filterd
          97 kB

          Activity

            People

            Assignee:
            sam.kleinman Sam Kleinman (Inactive)
            Reporter:
            wujiangcheng Jiangcheng Wu
            Participants:
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Dates

              Created:
              Updated:
              Resolved: