[SERVER-12158] Replica set blows up inelegantly when replicating authSchemaUpgradeStep changes to 2.4 secondary Created: 18/Dec/13  Updated: 10/Dec/14  Resolved: 18/Dec/13

Status: Closed
Project: Core Server
Component/s: Replication, Security
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Minor - P4
Reporter: Valeri Karpov Assignee: Unassigned
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

OSX, 3 node replica set with 1 2.5.4 node and 2 2.4.8 nodes


Attachments: Text File output.txt     File repl_upgrade_24_secondaries.js     File x509_repl_upgrade.js    
Issue Links:
Related
related to SERVER-11881 addUser crashing 2.4 mongod in mixed ... Closed
related to SERVER-12156 Don't allow authSchemaUpgrade to proc... Closed
Participants:

 Description   

See attached file for scripts and output. Basically, authSchemaUpgradeStep against a 2.5.4 primary causes 2.4.8 secondaries to crash:

m31002| Wed Dec 18 15:17:23.057 [repl writer worker 2] CMD: dropIndexes admin.system.users
m31000| Wed Dec 18 15:17:23.058 [repl writer worker 3] ERROR: writer worker caught exception: system.users entry must have either a 'pwd' field or a 'userSource' field, but not both on: { ts: Timestamp 1387397841000|11, h: 223338476503781983, v: 2, op: "i", ns: "admin.system.users", o: { _id: "admin.admin", user: "admin", db: "admin", credentials:

{ MONGODB-CR: "3dfa1231d2c5c39175c1de49530c0a65" }

, roles: [

{ role: "userAdminAnyDatabase", db: "admin" }

,

{ role: "readWriteAnyDatabase", db: "admin" }

,

{ role: "dbAdminAnyDatabase", db: "admin" }

,

{ role: "clusterAdmin", db: "admin" }

] } }
m31000| Wed Dec 18 15:17:23.058 [repl writer worker 3] Fatal Assertion 16360
m31000| 0x10044c60b 0x100425837 0x10033c97f 0x10042cc48 0x10047f1a5 0x7fff8c3e8772 0x7fff8c3d51a1
m31002| Wed Dec 18 15:17:23.058 [repl writer worker 2] build index admin.system.roles

{ _id: 1 }

m31002| Wed Dec 18 15:17:23.059 [repl writer worker 2] build index done. scanned 0 total records. 0 secs
m31002| Wed Dec 18 15:17:23.059 [repl writer worker 2] info: creating collection admin.system.roles on add index
m31002| Wed Dec 18 15:17:23.059 [repl writer worker 2] build index admin.system.roles

{ role: 1, db: 1 }

m31002| Wed Dec 18 15:17:23.060 [repl writer worker 2] build index done. scanned 0 total records. 0 secs
m31002| Wed Dec 18 15:17:23.060 [repl writer worker 2] build index admin.system.users

{ user: 1, db: 1 }

m31002| Wed Dec 18 15:17:23.061 [repl writer worker 2] build index done. scanned 0 total records. 0 secs
m31000| 0 mongod-248 0x000000010044c60b _ZN5mongo15printStackTraceERSo + 43
m31000| 1 mongod-248 0x0000000100425837 _ZN5mongo13fassertFailedEi + 151
m31000| 2 mongod-248 0x000000010033c97f _ZN5mongo7replset14multiSyncApplyERKSt6vectorINS_7BSONObjESaIS2_EEPNS0_8SyncTailE + 271
m31000| 3 mongod-248 0x000000010042cc48 _ZN5mongo10threadpool6Worker4loopEv + 138
m31000| 4 mongod-248 0x000000010047f1a5 thread_proxy + 229
m31000| 5 libsystem_c.dylib 0x00007fff8c3e8772 _pthread_start + 327
m31000| 6 libsystem_c.dylib 0x00007fff8c3d51a1 thread_start + 13
m31000| Wed Dec 18 15:17:23.061 [repl writer worker 3]
m31000|
m31000| ***aborting after fassert() failure
m31000|
m31000|
m31000| Wed Dec 18 15:17:23.061 Got signal: 6 (Abort trap: 6).
m31000|
m31000| Wed Dec 18 15:17:23.063 Backtrace:
m31000| 0x10044c60b 0x100001121 0x7fff8c3d690a 0 0x7fff8c42df61 0x100425875 0x10033c97f 0x10042cc48 0x10047f1a5 0x7fff8c3e8772 0x7fff8c3d51a1
m31000| 0 mongod-248 0x000000010044c60b _ZN5mongo15printStackTraceERSo + 43
m31000| 1 mongod-248 0x0000000100001121 _ZN5mongo10abruptQuitEi + 225
m31000| 2 libsystem_c.dylib 0x00007fff8c3d690a _sigtramp + 26
m31000| 3 ??? 0x0000000000000000 0x0 + 0
m31000| 4 libsystem_c.dylib 0x00007fff8c42df61 abort + 143
m31000| 5 mongod-248 0x0000000100425875 _ZN5mongo13fassertFailedEi + 213
m31000| 6 mongod-248 0x000000010033c97f _ZN5mongo7replset14multiSyncApplyERKSt6vectorINS_7BSONObjESaIS2_EEPNS0_8SyncTailE + 271
m31000| 7 mongod-248 0x000000010042cc48 _ZN5mongo10threadpool6Worker4loopEv + 138
m31000| 8 mongod-248 0x000000010047f1a5 thread_proxy + 229
m31000| 9 libsystem_c.dylib 0x00007fff8c3e8772 _pthread_start + 327
m31000| 10 libsystem_c.dylib 0x00007fff8c3d51a1 thread_start + 13
m31000|
m31002| Wed Dec 18 15:17:23.067 [rsBackgroundSync] replSet sync source problem: 10278 dbclient error communicating with server: specter.local:31000
m31002| Wed Dec 18 15:17:23.067 [rsBackgroundSync] replSet syncing to: specter.local:31001
m31002| Wed Dec 18 15:17:23.067 [conn6] end connection 10.4.101.171:61339 (2 connections now open)

This behavior is unsupported according to schwerin, but worth making a note.



 Comments   
Comment by Spencer Brody (Inactive) [ 18/Dec/13 ]

Root cause is the same as SERVER-11881, which is that 2.4 mongods enforce at a very low level that docs in system.users match the schema it expects.

Generated at Thu Feb 08 03:27:46 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.