[SERVER-1479] Error when setting up replica sets - assertion failure !sp.state.primary() db/repl/rs.h 184 Created: 24/Jul/10  Updated: 12/Jul/16  Resolved: 29/Jul/10

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: 1.5.6
Fix Version/s: 1.5.7

Type: Bug Priority: Major - P3
Reporter: nosh petigara Assignee: Eliot Horowitz (Inactive)
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

OSX 64


Issue Links:
Duplicate
is duplicated by SERVER-1482 rs.add() assert db/repl/rs.h:184 Closed
Operating System: ALL
Participants:

 Description   

Don't know if this is some weirdness (or screwup on my part) because I am running on the same machine... but I can reproduce this repeatedly:

Steps to reproduce:

1. start 1st server as follows:
./mongod --rest --replSet myset/Nosh-Petigaras-MacBook-Pro.local:27017 --dbpath /Users/nosh/replset/s1-data/
-then call rs.initiate(). Seems to elect itself primary

2. start 2nd server as follows:
./mongod --rest --replSet myset/Nosh-Petigaras-MacBook-Pro.local:27018,Nosh-Petigaras-MacBook-Pro.local:27017 --dbpath /Users/nosh/replset/s2-data/ --port 27018

-starts, up but doesn't really do anything.
[startReplSets] Sat Jul 24 16:27:25 replSet warning can't find self in the repl set configuration:
[startReplSets] Sat Jul 24 16:27:25 { _id: "myset", version: 1, members: [

{ _id: 0, host: "Nosh-Petigaras-MacBook-Pro.local:27017" }

] }
[startReplSets] Sat Jul 24 16:27:25 replSet info Couldn't load config yet. Sleeping 20sec and will try again.

3. Then call rs.add("Nosh-Petigaras-MacBook-Pro.local:27018") on first server
Seems to be working i.e. can see stuff about replsets on the second servr.
Then I get this error in the console of the 1st server:
[initandlisten] Sat Jul 24 16:27:05 connection accepted from 172.14.1.101:53692 #4
[conn2] Sat Jul 24 16:28:44 replSet replSetReconfig config object parses ok, 2 members specified
TODO : don't allow removal of a node until we handle it at the removed node end.
TEMP hb res cfg change:{ rs: true, set: "myset", state: 1, hbmsg: "", opTime: new Date(5497571762516262913), v: 1, config: { _id: "myset", version: 1, members: [

{ _id: 0, host: "Nosh-Petigaras-MacBook-Pro.local:27017" }

] }, ok: 1.0 }
TEMP hb res cfg change:

{ rs: true, errmsg: "still initializing", ok: 0.0 }

[conn2] Sat Jul 24 16:28:44 replSet replSetReconfig all members seem up
[conn2] Sat Jul 24 16:28:44 replSet info saving a newer config version to local.system.replset
[conn2] Sat Jul 24 16:28:44 Assertion failure !sp.state.primary() db/repl/rs.h 184
0x10007223e 0x1000803ae 0x1001ace0f 0x1001b1005 0x1001c731a 0x1001c83c1 0x1002e141e 0x1002e33a4 0x1001232f4 0x100124f87 0x1002480aa 0x10024d215 0x10037aec2 0x10038b964 0x7fff825408b6 0x7fff82540769
0 mongod 0x000000010007223e _ZN5mongo12sayDbContextEPKc + 174
1 mongod 0x00000001000803ae _ZN5mongo8assertedEPKcS1_j + 286
2 mongod 0x00000001001ace0f _ZN5mongo11ReplSetImpl14initFromConfigERNS_13ReplSetConfigE + 2447
3 mongod 0x00000001001b1005 _ZN5mongo7ReplSet13haveNewConfigERNS_13ReplSetConfigEb + 389
4 mongod 0x00000001001c731a _ZN5mongo18CmdReplSetReconfig4_runERKSsRNS_7BSONObjERSsRNS_14BSONObjBuilderEb + 1818
5 mongod 0x00000001001c83c1 _ZN5mongo18CmdReplSetReconfig3runERKSsRNS_7BSONObjERSsRNS_14BSONObjBuilderEb + 945
6 mongod 0x00000001002e141e _ZN5mongo11execCommandEPNS_7CommandERNS_6ClientEiPKcRNS_7BSONObjERNS_14BSONObjBuilderEb + 622
7 mongod 0x00000001002e33a4 _ZN5mongo12_runCommandsEPKcRNS_7BSONObjERNS_10BufBuilderERNS_14BSONObjBuilderEbi + 2804
8 mongod 0x00000001001232f4 _ZN5mongo11runCommandsEPKcRNS_7BSONObjERNS_5CurOpERNS_10BufBuilderERNS_14BSONObjBuilderEbi + 52
9 mongod 0x0000000100124f87 ZN5mongo8runQueryERNS_7MessageERNS_12QueryMessageERNS_5CurOpES1 + 6423
10 mongod 0x00000001002480aa _ZN5mongo13receivedQueryERNS_6ClientERNS_10DbResponseERNS_7MessageE + 586
11 mongod 0x000000010024d215 _ZN5mongo16assembleResponseERNS_7MessageERNS_10DbResponseERKNS_8SockAddrE + 4869
12 mongod 0x000000010037aec2 _ZN5mongo10connThreadEv + 562
13 mongod 0x000000010038b964 thread_proxy + 132
14 libSystem.B.dylib 0x00007fff825408b6 _pthread_start + 331
15 libSystem.B.dylib 0x00007fff82540769 thread_start + 13
[conn2] Sat Jul 24 16:28:44 replSet error unexpected exception in haveNewConfig() : 0 assertion db/repl/rs.h:184
[conn2] Sat Jul 24 16:28:44 replSet fatal error
[conn2] Sat Jul 24 16:28:44 replSet error fatal error, stopping replication
[conn2] Sat Jul 24 16:28:44 query admin.$cmd ntoreturn:1 command: { replSetReconfig: { _id: "myset", version: 2, members: [

{ _id: 0, host: "Nosh-Petigaras-MacBook-Pro.local:27017" }

,

{ _id: 1.0, host: "Nosh-Petigaras-MacBook-Pro.local:27018" }

] } } reslen:53 154ms



 Comments   
Comment by Kristina Chodorow (Inactive) [ 28/Jul/10 ]

This works for me now.

Comment by Kyle Banker [ 27/Jul/10 ]

Note that this is a duplicate issue. See linked issue.

Comment by Kyle Banker [ 27/Jul/10 ]

these are the same issue

Comment by auto [ 26/Jul/10 ]

Author:

{'login': 'banker', 'name': 'Kyle Banker', 'email': 'kylebanker@gmail.com'}

Message: SERVER-1479 and replSet js test cleanup
http://github.com/mongodb/mongo/commit/e94cc361ed088b5cb1f2cfe8ab30fc641c74639e

Comment by Kristina Chodorow (Inactive) [ 26/Jul/10 ]

I'm also getting this. Run:

$ ./mongod --replSet unicomplex/hostname:27017 --dbpath ~/dbs/blort1
$ ./mongod --replSet unicomplex/hostname:27017 --dbpath ~/dbs/blort2 --port 27018

Wait until 27017 prints "replSet sleeping 20sec and will try again." then in the shell:

> rs.initiate()
> // once the log says "replSet election succeeded, assuming primary role", run:
> rs.add("hostname:27018")

The ./mongod running at 27017 will have the assertion failure.

It's caused by the box.setOtherPrimary(0); line of rs.cpp.

Comment by auto [ 26/Jul/10 ]

Author:

{'login': 'banker', 'name': 'Kyle Banker', 'email': 'kylebanker@gmail.com'}

Message: SERVER-1482 SERVER-1479 failing replica set add test
http://github.com/mongodb/mongo/commit/c23a930a46a82e558ab467be1be63e9764d562e6

Comment by Eliot Horowitz (Inactive) [ 25/Jul/10 ]

kyle - can you add a test for this.
when you have a test that fiels, comment out failing part, commit and assign back to me

Generated at Thu Feb 08 02:57:08 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.