Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-12158

Replica set blows up inelegantly when replicating authSchemaUpgradeStep changes to 2.4 secondary

    • Type: Icon: Improvement Improvement
    • Resolution: Duplicate
    • Priority: Icon: Minor - P4 Minor - P4
    • None
    • Affects Version/s: None
    • Component/s: Replication, Security
    • Labels:
      None
    • Environment:
      OSX, 3 node replica set with 1 2.5.4 node and 2 2.4.8 nodes

      See attached file for scripts and output. Basically, authSchemaUpgradeStep against a 2.5.4 primary causes 2.4.8 secondaries to crash:

      m31002| Wed Dec 18 15:17:23.057 [repl writer worker 2] CMD: dropIndexes admin.system.users
      m31000| Wed Dec 18 15:17:23.058 [repl writer worker 3] ERROR: writer worker caught exception: system.users entry must have either a 'pwd' field or a 'userSource' field, but not both on: { ts: Timestamp 1387397841000|11, h: 223338476503781983, v: 2, op: "i", ns: "admin.system.users", o: { _id: "admin.admin", user: "admin", db: "admin", credentials:

      { MONGODB-CR: "3dfa1231d2c5c39175c1de49530c0a65" }

      , roles: [

      { role: "userAdminAnyDatabase", db: "admin" }

      ,

      { role: "readWriteAnyDatabase", db: "admin" }

      ,

      { role: "dbAdminAnyDatabase", db: "admin" }

      ,

      { role: "clusterAdmin", db: "admin" }

      ] } }
      m31000| Wed Dec 18 15:17:23.058 [repl writer worker 3] Fatal Assertion 16360
      m31000| 0x10044c60b 0x100425837 0x10033c97f 0x10042cc48 0x10047f1a5 0x7fff8c3e8772 0x7fff8c3d51a1
      m31002| Wed Dec 18 15:17:23.058 [repl writer worker 2] build index admin.system.roles

      { _id: 1 }

      m31002| Wed Dec 18 15:17:23.059 [repl writer worker 2] build index done. scanned 0 total records. 0 secs
      m31002| Wed Dec 18 15:17:23.059 [repl writer worker 2] info: creating collection admin.system.roles on add index
      m31002| Wed Dec 18 15:17:23.059 [repl writer worker 2] build index admin.system.roles

      { role: 1, db: 1 }

      m31002| Wed Dec 18 15:17:23.060 [repl writer worker 2] build index done. scanned 0 total records. 0 secs
      m31002| Wed Dec 18 15:17:23.060 [repl writer worker 2] build index admin.system.users

      { user: 1, db: 1 }

      m31002| Wed Dec 18 15:17:23.061 [repl writer worker 2] build index done. scanned 0 total records. 0 secs
      m31000| 0 mongod-248 0x000000010044c60b _ZN5mongo15printStackTraceERSo + 43
      m31000| 1 mongod-248 0x0000000100425837 _ZN5mongo13fassertFailedEi + 151
      m31000| 2 mongod-248 0x000000010033c97f _ZN5mongo7replset14multiSyncApplyERKSt6vectorINS_7BSONObjESaIS2_EEPNS0_8SyncTailE + 271
      m31000| 3 mongod-248 0x000000010042cc48 _ZN5mongo10threadpool6Worker4loopEv + 138
      m31000| 4 mongod-248 0x000000010047f1a5 thread_proxy + 229
      m31000| 5 libsystem_c.dylib 0x00007fff8c3e8772 _pthread_start + 327
      m31000| 6 libsystem_c.dylib 0x00007fff8c3d51a1 thread_start + 13
      m31000| Wed Dec 18 15:17:23.061 [repl writer worker 3]
      m31000|
      m31000| ***aborting after fassert() failure
      m31000|
      m31000|
      m31000| Wed Dec 18 15:17:23.061 Got signal: 6 (Abort trap: 6).
      m31000|
      m31000| Wed Dec 18 15:17:23.063 Backtrace:
      m31000| 0x10044c60b 0x100001121 0x7fff8c3d690a 0 0x7fff8c42df61 0x100425875 0x10033c97f 0x10042cc48 0x10047f1a5 0x7fff8c3e8772 0x7fff8c3d51a1
      m31000| 0 mongod-248 0x000000010044c60b _ZN5mongo15printStackTraceERSo + 43
      m31000| 1 mongod-248 0x0000000100001121 _ZN5mongo10abruptQuitEi + 225
      m31000| 2 libsystem_c.dylib 0x00007fff8c3d690a _sigtramp + 26
      m31000| 3 ??? 0x0000000000000000 0x0 + 0
      m31000| 4 libsystem_c.dylib 0x00007fff8c42df61 abort + 143
      m31000| 5 mongod-248 0x0000000100425875 _ZN5mongo13fassertFailedEi + 213
      m31000| 6 mongod-248 0x000000010033c97f _ZN5mongo7replset14multiSyncApplyERKSt6vectorINS_7BSONObjESaIS2_EEPNS0_8SyncTailE + 271
      m31000| 7 mongod-248 0x000000010042cc48 _ZN5mongo10threadpool6Worker4loopEv + 138
      m31000| 8 mongod-248 0x000000010047f1a5 thread_proxy + 229
      m31000| 9 libsystem_c.dylib 0x00007fff8c3e8772 _pthread_start + 327
      m31000| 10 libsystem_c.dylib 0x00007fff8c3d51a1 thread_start + 13
      m31000|
      m31002| Wed Dec 18 15:17:23.067 [rsBackgroundSync] replSet sync source problem: 10278 dbclient error communicating with server: specter.local:31000
      m31002| Wed Dec 18 15:17:23.067 [rsBackgroundSync] replSet syncing to: specter.local:31001
      m31002| Wed Dec 18 15:17:23.067 [conn6] end connection 10.4.101.171:61339 (2 connections now open)

      This behavior is unsupported according to schwerin, but worth making a note.

        1. output.txt
          75 kB
        2. repl_upgrade_24_secondaries.js
          3 kB
        3. x509_repl_upgrade.js
          3 kB

            Assignee:
            Unassigned Unassigned
            Reporter:
            valeri.karpov@mongodb.com Valeri Karpov
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: