[SERVER-9052] During failover in replicaset, MongoDB crashes Created: 21/Mar/13 Updated: 11/Jul/16 Resolved: 26/Mar/13 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Stability |
| Affects Version/s: | 2.2.3 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Dave Claussen | Assignee: | Stephen Lee |
| Resolution: | Done | Votes: | 0 |
| Labels: | crash, linux, replicaset | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
Unix Distribution: CentOS release 5.8 (Final), Driver: mongo-java-driver:2.9.1 Build: Compiled from source, with this command line:
OpenSSL version: 0.9.8e-22.el5_8.4 Deployment: one primary instance, one secondary, one arbiter. build info: Linux 12.servername.com 2.6.18-194.3.1.el5.028stab069.6 #1 SMP Tue Aug 10 21:28:51 GMT 2010 x86_64 BOOST_LIB_VERSION=1_49 |
||
| Attachments: |
|
| Operating System: | Linux |
| Steps To Reproduce: | Not sure exactly, but my guess is to have one primary, one secondary, and an arbiter and cause the secondary to die. Hopefully the logs will be enough to give you some information that will help determine what went wrong. |
| Participants: |
| Description |
|
We recently upgraded Mongo from 2.2.2 to 2.2.3. After a couple days of running, the primary couldn't be contacted by the arbiter and the secondary was elected to take over... however, at that point, MongoDB just stopped responding and put tons of errors in the log. (See attachments for the logs) The primary has these types of errors: problem detected during query over (DBNAME).(COLLECTION_NAME) : { $err: "not master and slaveOk=false", code: 13435 }[rsMgr] replSet can't see a majority, will not try to elect self recv(): message len XXX is too largeXX Assertion: 16141:cannot translate opcode 26975 ... but see the log for more details. The Java application server running the same box as the primary MongoDB instance (o16.servername.com) uses ReadPreference.primaryPreferred() when it does queries. The Java application server running the same box as the secondary MongoDB instance (o15.servername.com) uses ReadPreference.nearest() when it does queries, since it's physically across the country from the primary. P.S. after this happened, I updated my Java driver to mongo-java-driver:2.10.1 and am considering rebuilding using the latest openssl build (0.9.8e-26.el5_9.1). |
| Comments |
| Comment by Stephen Lee [ 26/Mar/13 ] |
|
No, aside from your use of OpenVZ, the only other major concern was |
| Comment by Dave Claussen [ 26/Mar/13 ] |
|
No issues since then, but I did find out that our Verio server is, in fact, using OpenVZ, despite the fact that we have a dedicated server. So the message "You are running in OpenVZ. This is known to be broken!!!" that I've been ignoring in the Mongo logs is in fact correct. =) We're switching our infrastructure to a different platform, so unless there's anything obvious in the logs you can close this ticket. -D |
| Comment by Stephen Lee [ 26/Mar/13 ] |
|
Have you noticed any issues since upgrading your Java driver? |
| Comment by Dave Claussen [ 25/Mar/13 ] |
|
Stephen, I've upgraded the Java driver to mongo-java-driver:2.10.1, and openssl to 0.9.8e-26.el5_9.1. |
| Comment by Stephen Lee [ 25/Mar/13 ] |
|
Dave, I would strongly recommend you upgrade your Java driver to v2.9.3 or later, due to |