-
Type: Bug
-
Resolution: Incomplete
-
Priority: Major - P3
-
None
-
Affects Version/s: 2.0.6
-
Component/s: Replication
-
Labels:None
-
Linux
A SECONDARY in our replica set keeps crashing for an unknown reason.
Our setup looks like this:
- PRIMARY (2.0.6)
- SECONDARY (2.0.6)
- SECONDARY (2.2RC0)
- ARBITER (2.0.6)
The first secondary (2.0.6) keeps crashing at, seemingly, random moments. A look at the log file only shows this as the final part:
set1 hu Aug 23 13:18:44 [PeriodicTask::Runner] task: WriteBackManager::cleaner took: 5ms Thu Aug 23 13:18:45 [PeriodicTask::Runner] task: DBConnectionPool-cleaner took: 48ms Thu Aug 23 13:18:45 [conn295] command admin.$cmd command: { replSetHeartbeat: "set1", v: 9, pv: 1, checkEmpty: false, from: "db3.set1:27018", $auth: {} } ntoreturn:1 reslen:120 130ms Thu Aug 23 13:18:45 [conn296] command admin.$cmd command: { replSetHeartbeat: "set1", v: 9, pv: 1, checkEmpty: false, from: "db1.set1:27018" } ntoreturn:1 reslen:120 130ms Thu Aug 23 13:18:45 [PeriodicTask::Runner] task: DBConnectionPool-cleaner took: 5ms Thu Aug 23 13:18:50 [conn297] command admin.$cmd command: { replSetHeartbeat: "set1", v: 9, pv: 1, checkEmpty: false, from: "db0.set1:27018" } ntoreturn:1 reslen:120 229ms Thu Aug 23 13:18:50 [conn295] command admin.$cmd command: { replSetHeartbeat: "set1", v: 9, pv: 1, checkEmpty: false, from: "db3.set1:27018", $auth: {} } ntoreturn:1 reslen:120 229ms Thu Aug 23 13:18:50 [conn296] command admin.$cmd command: { replSetHeartbeat: "set1", v: 9, pv: 1, checkEmpty: false, from: "db1.set1:27018" } ntoreturn:1 reslen:120 229ms Thu Aug 23 13:18:52 [initandlisten] connection accepted from 10.59.62.168:47834 #298 Thu Aug 23 13:18:53 [conn296] command admin.$cmd command: { replSetHeartbeat: "set1", v: 9, pv: 1, checkEmpty: false, from: "db1.set1:27018" } ntoreturn:1 reslen:120 340ms Thu Aug 23 13:18:53 [conn297] command admin.$cmd command: { replSetHeartbeat: "set1", v: 9, pv: 1, checkEmpty: false, from: "db0.set1:27018" } ntoreturn:1 reslen:120 340ms Thu Aug 23 13:18:53 [conn295] end connection 10.59.62.168:47831 Thu Aug 23 13:18:56 [conn296] command admin.$cmd command: { replSetHeartbeat: "set1", v: 9, pv: 1, checkEmpty: false, from: "db1.set1:27018" } ntoreturn:1 reslen:120 372ms Thu Aug 23 13:18:56 [conn297] command admin.$cmd command: { replSetHeartbeat: "set1", v: 9, pv: 1, checkEmpty: false, from: "db0.set1:27018" } ntoreturn:1 reslen:120 372ms Thu Aug 23 13:18:57 [conn298] command admin.$cmd command: { replSetHeartbeat: "set1", v: 9, pv: 1, checkEmpty: false, from: "db3.set1:27018", $auth: {} } ntoreturn:1 reslen:120 326ms Thu Aug 23 13:19:00 [conn296] command admin.$cmd command: { replSetHeartbeat: "set1", v: 9, pv: 1, checkEmpty: false, from: "db1.set1:27018" } ntoreturn:1 reslen:120 522ms Thu Aug 23 13:19:00 [conn297] command admin.$cmd command: { replSetHeartbeat: "set1", v: 9, pv: 1, checkEmpty: false, from: "db0.set1:27018" } ntoreturn:1 reslen:120 522ms Thu Aug 23 13:19:00 [conn298] command admin.$cmd command: { replSetHeartbeat: "set1", v: 9, pv: 1, checkEmpty: false, from: "db3.set1:27018", $auth: {} } ntoreturn:1 reslen:120 522ms 42625600/44589575 95% Thu Aug 23 13:19:04 [conn296] command admin.$cmd command: { replSetHeartbeat: "set1", v: 9, pv: 1, checkEmpty: false, from: "db1.set1:27018" } ntoreturn:1 reslen:120 142ms Thu Aug 23 13:19:04 [conn298] command admin.$cmd command: { replSetHeartbeat: "set1", v: 9, pv: 1, checkEmpty: false, from: "db3.set1:27018", $auth: {} } ntoreturn:1 reslen:120 142ms Thu Aug 23 13:19:04 [conn297] command admin.$cmd command: { replSetHeartbeat: "set1", v: 9, pv: 1, checkEmpty: false, from: "db0.set1:27018" } ntoreturn:1 reslen:120 142ms Thu Aug 23 13:19:07 [initandlisten] connection accepted from 10.230.41.239:52398 #299 Thu Aug 23 13:19:07 [conn296] end connection 10.230.41.239:52393 Thu Aug 23 13:19:07 [conn298] command admin.$cmd command: { replSetHeartbeat: "set1", v: 9, pv: 1, checkEmpty: false, from: "db3.set1:27018", $auth: {} } ntoreturn:1 reslen:120 101ms Thu Aug 23 13:19:07 [conn297] command admin.$cmd command: { replSetHeartbeat: "set1", v: 9, pv: 1, checkEmpty: false, from: "db0.set1:27018" } ntoreturn:1 reslen:120 101ms Thu Aug 23 13:19:08 [conn299] command admin.$cmd command: { replSetHeartbeat: "set1", v: 9, pv: 1, checkEmpty: false, from: "db1.set1:27018" } ntoreturn:1 reslen:120 326ms
Here's some more information:
- all instances run on EC2 w/ a couple of extra volumes each for storage
- all installs of MongoDB (except for 2.2) were done through the official YUM repo
- nothing in our config changed in the past days
- our processed data volume did not increase in the past days