Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-27006

Arbiter crashes with "Fatal assertion 28771 NoSuchKey" when upgrading a sharded cluster from 3.2.10 -> 3.2.11-rc0 and then downgrading back to 3.2.10

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Duplicate
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Replication
    • Labels:
      None
    • Operating System:
      ALL
    • Steps To Reproduce:
      Hide

      Set up a 3.2.10 WiredTiger CSRS sharded cluster with 2 mongoses, 3 config servers, and 2 shards. The shards each contain 3 members, one of which is an arbiter.

      Upgrade cluster to 3.2.11-rc0. Upgrade shards, then config servers, then mongoses.

      Downgrade cluster to 3.2.10. Downgrade mongoses, then shards. Upon downgrading the shards, one of the arbiters may crash.

      Show
      Set up a 3.2.10 WiredTiger CSRS sharded cluster with 2 mongoses, 3 config servers, and 2 shards. The shards each contain 3 members, one of which is an arbiter. Upgrade cluster to 3.2.11-rc0. Upgrade shards, then config servers, then mongoses. Downgrade cluster to 3.2.10. Downgrade mongoses, then shards. Upon downgrading the shards, one of the arbiters may crash.
    • Sprint:
      Repl 2016-11-21

      Description

      This came up in our automation tests.

      If we upgrade a sharded cluster from 3.2.10 -> 3.2.11-rc0 and then downgrade back to 3.2.10, one of the shard arbiters crashes with:

      2016-11-11T14:36:46.343-0500 I -        [initandlisten] Fatal assertion 28771 NoSuchKey: Missing expected field "ts"
      2016-11-11T14:36:46.349-0500 I CONTROL  [initandlisten] 
       0x10c6c293a 0x10c674101 0x10c662bcc 0x10c2ace5f 0x10c2de56c 0x10c2e21e5 0x10c2e4c75 0x10bdb7fb6 0x10bdb5a13 0x10bdba0fc 0x10bdb59f4
      ----- BEGIN BACKTRACE -----
      {"backtrace":[{"b":"10BDB5000","o":"90D93A","s":"_ZN5mongo15printStackTraceERNSt3__113basic_ostreamIcNS0_11char_traitsIcEEEE"},{"b":"10BDB5000","o":"8BF101","s":"_ZN5mongo10logContextEPKc"},{"b":"10BDB5000","o":"8ADBCC","s":"_ZN5mongo23fassertFailedWithStatusEiRKNS_6StatusE"},{"b":"10BDB5000","o":"4F7E5F","s":"_ZN5mongo4repl11getMinValidEPNS_16OperationContextE"},{"b":"10BDB5000","o":"52956C","s":"_ZN5mongo4repl39ReplicationCoordinatorExternalStateImpl21cleanUpLastApplyBatchEPNS_16OperationContextE"},{"b":"10BDB5000","o":"52D1E5","s":"_ZN5mongo4repl26ReplicationCoordinatorImpl21_startLoadLocalConfigEPNS_16OperationContextE"},{"b":"10BDB5000","o":"52FC75","s":"_ZN5mongo4repl26ReplicationCoordinatorImpl16startReplicationEPNS_16OperationContextE"},{"b":"10BDB5000","o":"2FB6","s":"_ZN5mongoL14_initAndListenEi"},{"b":"10BDB5000","o":"A13","s":"_ZN5mongo13initAndListenEi"},{"b":"10BDB5000","o":"50FC","s":"main"},{"b":"10BDB5000","o":"9F4","s":"start"}],"processInfo":{ "mongodbVersion" : "3.2.10", "gitVersion" : "79d9b3ab5ce20f51c272b4411202710a082d0317", "compiledModules" : [], "uname" : { "sysname" : "Darwin", "release" : "16.1.0", "version" : "Darwin Kernel Version 16.1.0: Thu Oct 13 21:26:57 PDT 2016; root:xnu-3789.21.3~60/RELEASE_X86_64", "machine" : "x86_64" }, "somap" : [ { "path" : "/tmp/mms-automation/test/versions/mongodb-osx-x86_64-3.2.10/bin/mongod", "machType" : 2, "b" : "10BDB5000", "vmaddr" : "100000000", "buildId" : "787C57070B50369E8157A670D010EF48" }, { "path" : "/usr/lib/libSystem.B.dylib", "machType" : 6, "b" : "7FFFDC016000", "vmaddr" : "7FFF88D49000", "buildId" : "CFC3669CFB443A5188151E84A168F3C5" }, { "path" : "/usr/lib/libc++.1.dylib", "machType" : 6, "b" : "7FFFDC148000", "vmaddr" : "7FFF88E7B000", "buildId" : "BEE86868F831384C919E2B286ACFE87C" }, { "path" : "/usr/lib/system/libcache.dylib", "machType" : 6, "b" : "7FFFDD4D1000", "vmaddr" : "7FFF8A204000", "buildId" : "84E55656FDA93B299E4FBE31B2C0AA3C" }, { "path" : "/usr/lib/system/libcommonCrypto.dylib", "machType" : 6, "b" : "7FFFDD4D6000", "vmaddr" : "7FFF8A209000", "buildId" : "31040F105E573B9C8D5B33AD87D1BEE8" }, { "path" : "/usr/lib/system/libcompiler_rt.dylib", "machType" : 6, "b" : "7FFFDD4E1000", "vmaddr" : "7FFF8A214000", "buildId" : "486BDE5281B43446BD7223977CAE556F" }, { "path" : "/usr/lib/system/libcopyfile.dylib", "machType" : 6, "b" : "7FFFDD4E9000", "vmaddr" : "7FFF8A21C000", "buildId" : "0DA49B7756EC362D98FFFA78CFD986D6" }, { "path" : "/usr/lib/system/libcorecrypto.dylib", "machType" : 6, "b" : "7FFFDD4F2000", "vmaddr" : "7FFF8A225000", "buildId" : "2684CC01087E33E28219AAA3BBD9BFD7" }, { "path" : "/usr/lib/system/libdispatch.dylib", "machType" : 6, "b" : "7FFFDD575000", "vmaddr" : "7FFF8A2A8000", "buildId" : "877B505D826C324684F70F850636039E" }, { "path" : "/usr/lib/system/libdyld.dylib", "machType" : 6, "b" : "7FFFDD5A8000", "vmaddr" : "7FFF8A2DB000", "buildId" : "7BFA347662103BCB8CE89B952F87BD84" }, { "path" : "/usr/lib/system/libkeymgr.dylib", "machType" : 6, "b" : "7FFFDD5AE000", "vmaddr" : "7FFF8A2E1000", "buildId" : "09CD7CA646D23A9FB9F17C4CA5CA0D68" }, { "path" : "/usr/lib/system/liblaunch.dylib", "machType" : 6, "b" : "7FFFDD5BC000", "vmaddr" : "7FFF8A2EF000", "buildId" : "7AB2E2EA8B47342087CE5EE18A4EEE49" }, { "path" : "/usr/lib/system/libmacho.dylib", "machType" : 6, "b" : "7FFFDD5BD000", "vmaddr" : "7FFF8A2F0000", "buildId" : "1EAE5ADD490C3B1F9F97447BA8E0E90F" }, { "path" : "/usr/lib/system/libquarantine.dylib", "machType" : 6, "b" : "7FFFDD5C3000", "vmaddr" : "7FFF8A2F6000", "buildId" : "F3E47D7C8776327C9426DD7DEB30DBDD" }, { "path" : "/usr/lib/system/libremovefile.dylib", "machType" : 6, "b" : "7FFFDD5C6000", "vmaddr" : "7FFF8A2F9000", "buildId" : "C4FC07FFED86382EB06F33C34718080C" }, { "path" : "/usr/lib/system/libsystem_asl.dylib", "machType" : 6, "b" : "7FFFDD5C8000", "vmaddr" : "7FFF8A2FB000", "buildId" : "F09874908427367FB302A05A7D61FEBF" }, { "path" : "/usr/lib/system/libsystem_blocks.dylib", "machType" : 6, "b" : "7FFFDD5E1000", "vmaddr" : "7FFF8A314000", "buildId" : "B8C3701D5A913D35999D2DC8D5393525" }, { "path" : "/usr/lib/system/libsystem_c.dylib", "machType" : 6, "b" : "7FFFDD5E2000", "vmaddr" : "7FFF8A315000", "buildId" : "5F9531F5EDA33D25A8273E0FD6B392BA" }, { "path" : "/usr/lib/system/libsystem_configuration.dylib", "machType" : 6, "b" : "7FFFDD670000", "vmaddr" : "7FFF8A3A3000", "buildId" : "CDC55FCBC1FC350DA9195DBCFC835B63" }, { "path" : "/usr/lib/system/libsystem_coreservices.dylib", "machType" : 6, "b" : "7FFFDD674000", "vmaddr" : "7FFF8A3A7000", "buildId" : "5DE691C67EE63210895D9EA3ECBC09B4" }, { "path" : "/usr/lib/system/libsystem_coretls.dylib", "machType" : 6, "b" : "7FFFDD678000", "vmaddr" : "7FFF8A3AB000", "buildId" : "8F7E9B12400D3276A9C54546B0258554" }, { "path" : "/usr/lib/system/libsystem_dnssd.dylib", "machType" : 6, "b" : "7FFFDD691000", "vmaddr" : "7FFF8A3C4000", "buildId" : "28E52C39DF10340FA3ECC0119AF6361F" }, { "path" : "/usr/lib/system/libsystem_info.dylib", "machType" : 6, "b" : "7FFFDD698000", "vmaddr" : "7FFF8A3CB000", "buildId" : "C686B8345E7D382CAF6E44AB78EE83E2" }, { "path" : "/usr/lib/system/libsystem_kernel.dylib", "machType" : 6, "b" : "7FFFDD6C2000", "vmaddr" : "7FFF8A3F5000", "buildId" : "EC53F92A0DFA3027A220414A01F17B2E" }, { "path" : "/usr/lib/system/libsystem_m.dylib", "machType" : 6, "b" : "7FFFDD6E5000", "vmaddr" : "7FFF8A418000", "buildId" : "7F86C291B10531C1992390EBAB22B73F" }, { "path" : "/usr/lib/system/libsystem_malloc.dylib", "machType" : 6, "b" : "7FFFDD72D000", "vmaddr" : "7FFF8A460000", "buildId" : "F98400804C2C3F3B80877C738F12A1C7" }, { "path" : "/usr/lib/system/libsystem_network.dylib", "machType" : 6, "b" : "7FFFDD74C000", "vmaddr" : "7FFF8A47F000", "buildId" : "2BAFB24F999C3148BDD8F28E05F716F7" }, { "path" : "/usr/lib/system/libsystem_networkextension.dylib", "machType" : 6, "b" : "7FFFDD7A4000", "vmaddr" : "7FFF8A4D7000", "buildId" : "971DD3ADD17A32FF95DE0A5A979E68AE" }, { "path" : "/usr/lib/system/libsystem_notify.dylib", "machType" : 6, "b" : "7FFFDD7AE000", "vmaddr" : "7FFF8A4E1000", "buildId" : "EAD023A2AD3F31C89489274B9A42DA61" }, { "path" : "/usr/lib/system/libsystem_platform.dylib", "machType" : 6, "b" : "7FFFDD7B8000", "vmaddr" : "7FFF8A4EB000", "buildId" : "2F2D6A81C36C353DB27BA6643A32375E" }, { "path" : "/usr/lib/system/libsystem_pthread.dylib", "machType" : 6, "b" : "7FFFDD7C1000", "vmaddr" : "7FFF8A4F4000", "buildId" : "46375095473130349D87396DE95FC697" }, { "path" : "/usr/lib/system/libsystem_sandbox.dylib", "machType" : 6, "b" : "7FFFDD7CC000", "vmaddr" : "7FFF8A4FF000", "buildId" : "2D42A2BFA7AF352AA821D8F6E85A63AC" }, { "path" : "/usr/lib/system/libsystem_secinit.dylib", "machType" : 6, "b" : "7FFFDD7D0000", "vmaddr" : "7FFF8A503000", "buildId" : "A54B8FEFE7923C548E0BE80A376662F2" }, { "path" : "/usr/lib/system/libsystem_symptoms.dylib", "machType" : 6, "b" : "7FFFDD7D2000", "vmaddr" : "7FFF8A505000", "buildId" : "8FB7CA3779EF3651B5B9B5E1E0947067" }, { "path" : "/usr/lib/system/libsystem_trace.dylib", "machType" : 6, "b" : "7FFFDD7DA000", "vmaddr" : "7FFF8A50D000", "buildId" : "C029B910A65F35F6B194B933B454EAB4" }, { "path" : "/usr/lib/system/libunwind.dylib", "machType" : 6, "b" : "7FFFDD7FB000", "vmaddr" : "7FFF8A52E000", "buildId" : "9F7C2AD8A9A73DE4828DB0F0F166AAA0" }, { "path" : "/usr/lib/system/libxpc.dylib", "machType" : 6, "b" : "7FFFDD801000", "vmaddr" : "7FFF8A534000", "buildId" : "85EB25FD218F38EE9E69391CC8EBE6C5" }, { "path" : "/usr/lib/libobjc.A.dylib", "machType" : 6, "b" : "7FFFDCCBA000", "vmaddr" : "7FFF899ED000", "buildId" : "F9AFE665A3A23285949519803A565861" }, { "path" : "/usr/lib/libauto.dylib", "machType" : 6, "b" : "7FFFDC127000", "vmaddr" : "7FFF88E5A000", "buildId" : "5BBF6A00CC76389D84E7CA88EDADE683" }, { "path" : "/usr/lib/libc++abi.dylib", "machType" : 6, "b" : "7FFFDC19F000", "vmaddr" : "7FFF88ED2000", "buildId" : "1CEF8ABB7E6D3C2F8E0AE7884478DD23" } ] }}
       mongod(_ZN5mongo15printStackTraceERNSt3__113basic_ostreamIcNS0_11char_traitsIcEEEE+0x3A) [0x10c6c293a]
       mongod(_ZN5mongo10logContextEPKc+0x171) [0x10c674101]
       mongod(_ZN5mongo23fassertFailedWithStatusEiRKNS_6StatusE+0xEC) [0x10c662bcc]
       mongod(_ZN5mongo4repl11getMinValidEPNS_16OperationContextE+0x65F) [0x10c2ace5f]
       mongod(_ZN5mongo4repl39ReplicationCoordinatorExternalStateImpl21cleanUpLastApplyBatchEPNS_16OperationContextE+0x1C) [0x10c2de56c]
       mongod(_ZN5mongo4repl26ReplicationCoordinatorImpl21_startLoadLocalConfigEPNS_16OperationContextE+0x745) [0x10c2e21e5]
       mongod(_ZN5mongo4repl26ReplicationCoordinatorImpl16startReplicationEPNS_16OperationContextE+0x145) [0x10c2e4c75]
       mongod(_ZN5mongoL14_initAndListenEi+0x1F66) [0x10bdb7fb6]
       mongod(_ZN5mongo13initAndListenEi+0x13) [0x10bdb5a13]
       mongod(main+0x3FC) [0x10bdba0fc]
       mongod(start+0x34) [0x10bdb59f4]
      -----  END BACKTRACE  -----
      2016-11-11T14:36:46.349-0500 I -        [initandlisten] 
       
      ***aborting after fassert() failure
      

      Attached is a tarball of the logs. Here is a description of the files:

      run9001 - shard A data node 1 post upgrade
      run9001.2016-11-11T19-35-52 - shard A data node 1 pre upgrade
      run9002 - shard A data node 2 post upgrade
      run9002.2016-11-11T19-35-34 - shard A data node 3 pre upgrade
      run9003 - shard A arbiter post downgrade (CRASHED)
      run9003.2016-11-11T19-35-26 - shard A arbiter post upgrade
      run9003.2016-11-11T19-36-46 - shard A arbiter pre upgrade
      run9004 - shard B data node 1 post downgrade
      run9004.2016-11-11T19-35-47 - shard B data node 1 post upgrade
      run9004.2016-11-11T19-36-35 - shard B data node 1 pre upgrade
      run9005 - shard B data node 2 post downgrade
      run9005.2016-11-11T19-35-28 - shard B data node 2 post upgrade
      run9005.2016-11-11T19-37-05 - shard B data node 2 pre upgrade
      run9006 - shard B arbiter post downgrade
      run9006.2016-11-11T19-35-34 - shard B arbiter post upgrade
      run9006.2016-11-11T19-36-45 - shard B arbiter pre upgrade
      run9007 - config server 1 post upgrade
      run9007.2016-11-11T19-36-17 - config server 1 pre upgrade
      run9008 - config server 2 post upgrade
      run9008.2016-11-11T19-35-57 - config server 2 pre upgrade
      run9009 - config server 3 post upgrade
      run9009.2016-11-11T19-36-02 - config server 3 pre upgrade
      run9010 - mongos 1 post downgrade
      run9010.2016-11-11T19-36-21 - mongos 1 post upgrade
      run9010.2016-11-11T19-36-32 - mongos 1 pre upgrade
      run9011 - mongos 2 post downgrade
      run9011.2016-11-11T19-36-24 - mongos 2 post upgrade
      run9011.2016-11-11T19-36-41 - mongos 2 pre upgrade
      

      run9003 contains the backtrace.

        Attachments

          Issue Links

            Activity

              People

              • Votes:
                0 Vote for this issue
                Watchers:
                8 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: