Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-4305

Deadlock of secondary trying to sync the oplog if index versions are mixed on master



    • Type: Bug
    • Status: Closed
    • Priority: Blocker - P1
    • Resolution: Cannot Reproduce
    • Affects Version/s: 2.0.1
    • Fix Version/s: None
    • Component/s: None
    • Environment:
      Linux 2.6.32-35-server, Ubuntu 10.04, MongoDB 2.0.1, Replicaset with 3 Nodes, NUMA, 2x XEON E5620 , 24 GB RAM
    • Operating System:


      What we want to do:
      upgrade all indexes to version 2.0 (v:1) in our replicaset

      How we do this:
      we start with one secondary, shut it down, change port and remove repset param, start it with repair command to reindex all indexes.
      after repair reset configuration and connect it to the repset. Wait until the slave is up2date to proceed with the next secondary.

      What is the problem:
      The secondary is not able to catch up with the master. It has a single process running with 100% cpu usage and almost idle io. (cpu bound)
      It falls slowly more and more behind. (all hosts have the same hardware)

      What we suspect:
      We have a database which has some indexes with the old version and some with the new. If a secondary upgrades the indexes, it has all indexes on the latest version and this locks the replay/resync of the oplog from the master which still has the mixed version indexes.

      We downgraded the indexes again with an older mongod binary (1.8.4). After this was finished, we connected the secondary to the replicaset again and it replayed the oplog without a problem and is now in sync again.

      All hosts have the mongod binary version 2.0.1.

      I've attached iostat and mongostat output. The host with the problem is mn01.
      In the mongod.log is no error message just some reoccurring message about the cursor:
      Wed Nov 16 06:32:56 [rsSync] repl: old cursor isDead, will initiate a new one



        1. iostat.txt
          55 kB
        2. mongostat.txt
          193 kB
        3. secondary.log.txt
          92 kB
        4. secondary-14-nov-repair.log.zip
          3.51 MB
        5. secondary-15-nov.log.zip
          62 kB
        6. strace.txt
          26 kB



            kristina Kristina Chodorow (Inactive)
            steffen Steffen
            0 Vote for this issue
            3 Start watching this issue