[SERVER-9460] replSetGetStatus seems to report health wrong Created: 25/Apr/13  Updated: 03/Mar/15  Resolved: 23/Feb/15

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: 2.4.1, 2.4.3
Fix Version/s: 2.7.8

Type: Bug Priority: Major - P3
Reporter: Christian Amor Kvalheim Assignee: Unassigned
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Backwards Compatibility: Minor Change
Operating System: ALL
Steps To Reproduce:
  • Intermittent, have not been able to reproduce the exact states consistently. Never happens on osx but observed under ubuntu 12.0.4
Participants:

 Description   

As one can see from the replSetGetStatus result below 30011 should not be marked as healthy, yet it is. To work around this a health set is not detected only if the health value is 1 the member is in a valid state and errmsg as well as lastHeartbeatMessage is not set. Seems like a logic error somewhere in the command.

errmsg and heartbeatMessage are correct as the replicaset is not in a correct state. The health of 30011 should not be 1.

30011 server is down

{ set: 'replica-set-foo',
  date: Thu Apr 25 2013 14:14:28 GMT+0200 (CEST),
  myState: 2,
  syncingTo: 'localhost:30011',
  members: 
   [ { _id: 0,
       name: 'localhost:30010',
       health: 1,
       state: 2,
       stateStr: 'SECONDARY',
       uptime: 39,
       optime: [Object],
       optimeDate: Thu Apr 25 2013 14:14:20 GMT+0200 (CEST),
       errmsg: 'db exception in producer: 10278 dbclient error communicating with server: localhost:30011',
       self: true },
     { _id: 1,
       name: 'localhost:30011',
       health: 1,
       state: 1,
       stateStr: 'PRIMARY',
       uptime: 27,
       optime: [Object],
       optimeDate: Thu Apr 25 2013 14:14:20 GMT+0200 (CEST),
       lastHeartbeat: Thu Apr 25 2013 14:14:27 GMT+0200 (CEST),
       lastHeartbeatRecv: Thu Jan 01 1970 01:00:00 GMT+0100 (CET),
       pingMs: 0,
       lastHeartbeatMessage: 'still initializing' },
     { _id: 2,
       name: 'localhost:30012',
       health: 1,
       state: 2,
       stateStr: 'SECONDARY',
       uptime: 27,
       optime: [Object],
       optimeDate: Thu Apr 25 2013 14:14:20 GMT+0200 (CEST),
       lastHeartbeat: Thu Apr 25 2013 14:14:27 GMT+0200 (CEST),
       lastHeartbeatRecv: Thu Jan 01 1970 01:00:00 GMT+0100 (CET),
       pingMs: 0,
       lastHeartbeatMessage: 'db exception in producer: 10278 dbclient error communicating with server: localhost:30011' },
     { _id: 3,
       name: 'localhost:30013',
       health: 1,
       state: 7,
       stateStr: 'ARBITER',
       uptime: 27,
       lastHeartbeat: Thu Apr 25 2013 14:14:27 GMT+0200 (CEST),
       lastHeartbeatRecv: Thu Jan 01 1970 01:00:00 GMT+0100 (CET),
       pingMs: 0 } ],
  ok: 1 }
 



 Comments   
Comment by Eric Milkie [ 23/Feb/15 ]

In 3.0, health is no longer reported as 1 before the first heartbeat.

Comment by Christian Amor Kvalheim [ 07/May/13 ]

I think it's the later.

Comment by Daniel Pasette (Inactive) [ 25/Apr/13 ]

How are you inducing this state? Can you post the logs?

Generated at Thu Feb 08 03:20:29 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.