[SERVER-11308] SNMP requests may timeout on replica set with secondary as SNMP master Created: 22/Oct/13  Updated: 10/Dec/14  Resolved: 04/Nov/13

Status: Closed
Project: Core Server
Component/s: Diagnostics
Affects Version/s: 2.5.3
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Cadran Cowansage Assignee: James Wahlin
Resolution: Done Votes: 0
Labels: 26qa
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:
  • EC2, Amazon Linux, m1.large
  • PSS replica set
  • primary mongod server and one secondary as SNMP subagents, the other secondary as SNMP master

Attachments: Text File 1.log     Text File 2.log     Text File 3.log     File snmp.mongod.conf    
Issue Links:
Related
Operating System: ALL
Steps To Reproduce:

0.) Install standard pre-req packages for SNMP.
1.) Edit /etc/snmp/mongod.conf config file for subagent connection through TCP:1705 (mongod.conf attached).
2.) Launch a PSS replica set. A secondary was SNMP master, the master and other secondary were subagents. (All nodes are on the same host.)
3.) Run snmpwalk of master (I was running snmpwalk locally)
snmpwalk -m MONGO-MIB -v 2c -c mongodb 0.0.0.0:1161 1.3.6.1.4.1.34601

Actual command-line flags used for each host:
Secondary : ./mongod --snmp-subagent --replSet set1 --logpath /var/log/mongodb/1.log --dbpath /data/rs1 --port 27017 --fork --oplogSize 200 --smallfiles
Primary : ./mongod --snmp-subagent --replSet set1 --logpath /var/log/mongodb/2.log --dbpath /data/rs2 --port 27018 --fork --oplogSize 200 --smallfiles
Secondary : ./mongod --snmp-master --replSet set1 --logpath /var/log/mongodb/3.log --dbpath /data/rs3 --port 27019 --fork --oplogSize 200 --smallfiles

Participants:

 Description   

Given a standard PSS replica set, with a secondary configured as SNMP master, and the other two nodes configured as SNMP subagents, I get a failure partway through the snmpwalk, but no exceptions in the mongod logs.

snmpwalk -m MONGO-MIB -v 2c -c mongodb 0.0.0.0:1161 1.3.6.1.4.1.34601
MONGO-MIB::serverName."27016" = STRING: "ip-10-234-11-233:27016"
MONGO-MIB::serverName."27017" = STRING: "ip-10-234-11-233:27017"
MONGO-MIB::serverName."27018" = STRING: "ip-10-234-11-233:27018"
MONGO-MIB::sysUpTime."27016" = Timeticks: (6293) 0:01:02.93
MONGO-MIB::sysUpTime."27017" = Timeticks: (5327) 0:00:53.27
MONGO-MIB::sysUpTime."27018" = Timeticks: (4346) 0:00:43.46
MONGO-MIB::globalOpInsert."27016" = Counter32: 1
MONGO-MIB::globalOpInsert."27017" = Counter32: 1
MONGO-MIB::globalOpInsert."27018" = Counter32: 1
MONGO-MIB::globalOpQuery."27016" = Counter32: 5
MONGO-MIB::globalOpQuery."27017" = Counter32: 8
MONGO-MIB::globalOpQuery."27018" = Counter32: 3
MONGO-MIB::globalOpUpdate."27016" = Counter32: 0
MONGO-MIB::globalOpUpdate."27017" = Counter32: 3
MONGO-MIB::globalOpUpdate."27018" = Counter32: 0
MONGO-MIB::globalOpDelete."27016" = Counter32: 0
MONGO-MIB::globalOpDelete."27017" = Counter32: 0
MONGO-MIB::globalOpDelete."27018" = Counter32: 0
MONGO-MIB::globalOpGetMore."27016" = Counter32: 0
MONGO-MIB::globalOpGetMore."27017" = Counter32: 17
Error in packet.
Reason: (genError) A general failure occured
Failed object: MONGO-MIB::globalOpGetMore."27017"



 Comments   
Comment by Eric Milkie [ 04/Nov/13 ]

Thanks for discovering that. How long does it look like it's taking? I suppose this means one of the data items is blocking on a held mutex?
What's the default timeout if you don't supply one in the config?
We might want to add to the SNMP documentation about setting timeouts.

Comment by James Wahlin [ 04/Nov/13 ]

This is due to a timeout in an agentX request. I changed the timeout on both the master and subagents to 5 seconds with the following flag. Using this I can consistently retrieve metrics for all members of a 3 member set:

agentXTimeout 5

Comment by Eric Milkie [ 22/Oct/13 ]

Might be a field is unexpectedly absent when in secondary mode.

Generated at Thu Feb 08 03:25:26 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.