[SERVER-1533] Replica set SECONDARY crash when terminating other PRIMARY and SECONDARY Created: 01/Aug/10 Updated: 03/Aug/10 Resolved: 03/Aug/10 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | 1.5.7 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | David Mytton | Assignee: | Dwight Merriman |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Operating System: | Linux | ||||||||
| Participants: | |||||||||
| Description |
|
I am unable to reproduce this again and am not totally sure of the circumstances leading to it, but there may be some useful information in the backtrace. I had 3 members of replica set. I terminate the primary and then immediately after one of the secondaries. The third secondary crashed on its own. PRIMARY mongodb-linux-x86_64-1.5.7/bin/mongod --shardsvr --dbpath data/ --replSet set1/domU-12-31-39-06-89-F1.compute-1.internal:27018 --rest Sun Aug 1 05:16:33 [conn9] getmore local.oplog.rs cid:5805707532161602891 getMore: { ts: { $gte: new Date(5500358045240131585) } } bytes:20 nreturned:0 3007ms } bytes:20 nreturned:0 3003ms Sun Aug 1 05:16:46 [interruptThread] shutdown: going to close listening sockets... Sun Aug 1 05:16:46 [interruptThread] shutdown: removing fs lock... SECONDARY 1 mongodb-linux-x86_64-1.5.7/bin/mongod --shardsvr --dbpath data/ --replSet set1/domU-12-31-39-06-29-52.compute-1.internal:27018 --rest Sun Aug 1 05:17:57 [conn5] getmore local.oplog.rs cid:5040576339790303823 getMore: { ts: { $gte: new Date(5500358045240131585) } } bytes:20 nreturned:0 3007ms } bytes:20 nreturned:0 3007ms Sun Aug 1 05:18:03 [interruptThread] shutdown: going to close listening sockets... Sun Aug 1 05:18:03 [interruptThread] shutdown: removing fs lock... SECONDARY 2 mongodb-linux-x86_64-1.5.7/bin/mongod --shardsvr --dbpath data/ --replSet set1/domU-12-31-39-06-29-52.compute-1.internal:27018 --rest Sun Aug 1 04:38:36 [rs_sync] replSet SECONDARY 0x801ab0 0x2aaaaaf9ee76 0x2aaaaaf9eea3 0x2aaaaaf9ef8a 0x53ddfc 0x67ca7b 0x526a08 0x818340 0x2aaaaaccd407 0x2aaaab748b0d Sun Aug 1 05:18:04 Backtrace: Sun Aug 1 05:18:04 dbexit: Sun Aug 1 05:18:04 [rs Manager] shutdown: going to close listening sockets... Sun Aug 1 05:18:04 [rs Manager] shutdown: removing fs lock... |
| Comments |
| Comment by Eliot Horowitz (Inactive) [ 03/Aug/10 ] |
|
SEE |
| Comment by Dwight Merriman [ 03/Aug/10 ] |
|
i think you are right we are doing 1.5.8 today, so you might want to wait for that and try it. |
| Comment by David Mytton [ 03/Aug/10 ] |
|
Is this the right nightly build? It seems to have failed http://buildbot.mongodb.org/builders/Nightly%20Linux%2064-bit |
| Comment by Dwight Merriman [ 03/Aug/10 ] |
|
dave can you try the latest code (newer than 1.5.7? maybe tomorrow's daily build?) the newer code at a minimum has better logging diagnostics on the issue. |
| Comment by David Mytton [ 03/Aug/10 ] |
|
I experienced this again. Does look similar to Terminated PRIMARY: Mon Aug 2 21:34:53 [conn9] getmore local.oplog.rs cid:1283402304456619496 getMore: { ts: { $gte: new Date(5500988726827810817) } } bytes:20 nreturned:0 3013ms } bytes:20 nreturned:0 3013ms Mon Aug 2 21:35:00 [interruptThread] shutdown: going to close listening sockets... Mon Aug 2 21:35:00 [interruptThread] shutdown: removing fs lock... SECONDARY 1 took over as PRIMARY SECONDARY 2 crashed Mon Aug 2 21:34:42 [initandlisten] connection accepted from 10.255.62.79:43801 #6 0x5313c3 0x53db81 0x67ca7b 0x526a08 0x818340 0x2aaaaaccd407 0x2aaaab748b0d 0x801ab0 0x2aaaaaf9ee76 0x2aaaaaf9eea3 0x2aaaaaf9ef8a 0x53ddfc 0x67ca7b 0x526a08 0x818340 0x2aaaaaccd407 0x2aaaab748b0d Mon Aug 2 21:35:02 Backtrace: Mon Aug 2 21:35:02 dbexit: Mon Aug 2 21:35:02 [rs Manager] shutdown: going to close listening sockets... Mon Aug 2 21:35:02 dbexit: ; exiting immediately Mon Aug 2 21:35:02 [conn1] end connection 127.0.0.1:47262 Mon Aug 2 21:35:02 [initandlisten] now exiting SECONDARY 1 (now PRIMARY) unable to continue as all other members of the replica set are down. SECONDARY 2 details (the one that crashed) mongodb-linux-x86_64-1.5.7/bin/mongod --rest --shardsvr --replSet set1/domU-12-31-39-06-2C-81.compute-1.internal --dbpath data/ --port 27017
Mon Aug 2 21:25:04 db version v1.5.7, pdfile version 4.5 |
| Comment by Alex [ 02/Aug/10 ] |
|
This looks like it's connected to issue http://jira.mongodb.org/browse/SERVER-1483 |
| Comment by David Mytton [ 02/Aug/10 ] |
|
Yes, 1.5.7. |
| Comment by Dwight Merriman [ 02/Aug/10 ] |
|
will investigate. this was with 1.5.7? |
| Comment by David Mytton [ 01/Aug/10 ] |
|
Seem to be missing the full SECONDARY 1 log. Here it is: Sun Aug 1 04:38:24 [ReplSetHealthPollTask] replSet info domU-12-31-39-06-29-52.compute-1.internal:27018 is now up } bytes:20 nreturned:0 3008ms } bytes:20 nreturned:0 3008ms } bytes:20 nreturned:0 3016ms } bytes:20 nreturned:0 3025ms } bytes:20 nreturned:0 3017ms } bytes:20 nreturned:0 3008ms } bytes:20 nreturned:0 3007ms } bytes:20 nreturned:0 3007ms Sun Aug 1 05:18:03 [interruptThread] shutdown: going to close listening sockets... Sun Aug 1 05:18:03 [interruptThread] shutdown: removing fs lock... |