[SERVER-2975] Replica set master failure detection Created: 21/Apr/11 Updated: 29/May/12 Resolved: 02/May/11 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | 1.8.1 |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Mathieu Poumeyrol | Assignee: | Kristina Chodorow (Inactive) |
| Resolution: | Duplicate | Votes: | 2 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
linux, ec2, ebs |
||
| Issue Links: |
|
||||||||
| Participants: | |||||||||
| Description |
|
During the massive EC2 fail earlier this morning, the master of one of our replica set was impacted, not responding to the clients still connected without closing the connections. The other members of the set did not pick the failure up, and it was not possible to send a "stepdown" command to it. (As the computer was not answering ssh, we did a remote reboot to force the replica set on its two other feet).
|
| Comments |
| Comment by Kristina Chodorow (Inactive) [ 02/May/11 ] |
|
Both, mongos and mongod use the code that was fixed. |
| Comment by Jonathan Wollman [ 02/May/11 ] |
|
Was this fixed in MongoS or in the server? |
| Comment by Kristina Chodorow (Inactive) [ 02/May/11 ] |
|
Fixed and backported to 1.8.2. |
| Comment by Mathieu Poumeyrol [ 28/Apr/11 ] |
|
+1 for a backport to 1.8 — that is, if that's possible. 1.8 is just a few weeks old, 2.0 seams awfully away for something with such concrete availability impact. |
| Comment by Jonathan Wollman [ 28/Apr/11 ] |
|
Are guys considering this as patch to 1.8 release? Thx Sent from my iPhone |
| Comment by Kristina Chodorow (Inactive) [ 27/Apr/11 ] |
|
That's fine, I've figured out what the bug was and I'll be committing the fix once 1.9.0 is out. |
| Comment by Mathieu Poumeyrol [ 26/Apr/11 ] |
|
No luck in rescuing logs from the failing master. They were rotated away before AWS could restore our access to the computer. |
| Comment by Mathieu Poumeyrol [ 21/Apr/11 ] |
|
Sure, but I prefer not to share my ip addresses and stuff with everyone. I'm "kali" on freenode, I already /msged a link to kchodorow_ . |
| Comment by Eliot Horowitz (Inactive) [ 21/Apr/11 ] |
|
Can you send the logs from one of the secondaries? |