[SERVER-2137] primary node instance is gone there is no message Created: 23/Nov/10  Updated: 29/May/12  Resolved: 02/Sep/11

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 1.6.3
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Joseph Wang Assignee: Unassigned
Resolution: Incomplete Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Operating System: Linux
Participants:

 Description   

The primary node instance is gone.

[joseph.wang@stage1.vpc3 logs]$ ps auxww | grep mongo
2040 1126 0.3 15.3 15876836 1211024 ? Sl Nov19 17:20 /usr/local/mongodb-linux-x86_64-1.6.3/bin/mongod -f /db/lps-mongodb_slave1_stage1/mongodb.cnf
2041 1195 0.2 15.3 15875004 1204444 ? Sl Nov19 10:53 /usr/local/mongodb-linux-x86_64-1.6.3/bin/mongod -f /db/lps-mongodb_slave2_stage1/mongodb.cnf
805 13561 0.0 0.0 6024 592 pts/5 R+ 18:27 0:00 grep mongo

[joseph.wang@stage1.vpc3 logs]$ /usr/local/mongodb-linux-x86_64-1.6.3/bin/mongo localhost:4102
MongoDB shell version: 1.6.3
connecting to: localhost:4102/test
> rs.status()
{
"set" : "lp",
"date" : "Mon Nov 22 2010 18:26:10 GMT-0800 (PST)",
"myState" : 1,
"members" : [

{ "_id" : 0, "name" : "stage1.vpc3.estalea.net:4101", "health" : 0, "state" : 1, "uptime" : 0, "lastHeartbeat" : "Mon Nov 22 2010 17:47:31 GMT-0800 (PST)", "errmsg" : "connect/transport error" }

,

{ "_id" : 1, "name" : "stage1.vpc3.estalea.net:4102", "health" : 1, "state" : 1, "self" : true }

,

{ "_id" : 2, "name" : "stage1.vpc3.estalea.net:4103", "health" : 1, "state" : 2, "uptime" : 267327, "lastHeartbeat" : "Mon Nov 22 2010 18:26:10 GMT-0800 (PST)" }

],
"ok" : 1
}

Here is last of the logs from the initial primary node:
[joseph.wang@stage1.vpc3 logs]$ tail mongodb.log
Mon Nov 22 17:42:46 [conn268] getmore local.oplog.rs cid:3986709940930560870 getMore: { ts:

{ $gte: new Date(5541403724838091364) }

} bytes:1048606 nreturned:4333 5ms
Mon Nov 22 17:42:46 [conn486] insert refinance.refinancebase20101123014059 75ms
Mon Nov 22 17:42:46 [conn486] run command refinance.$cmd

{ getlasterror: 1 }

Mon Nov 22 17:42:46 [conn486] query refinance.$cmd ntoreturn:1 command:

{ getlasterror: 1 }

reslen:81 0ms
Mon Nov 22 17:42:46 [conn486] run command refinance.$cmd

{ reseterror: 1 }

Mon Nov 22 17:42:46 [conn486] query refinance.$cmd ntoreturn:1 command:

{ reseterror: 1 }

reslen:53 0ms
Mon Nov 22 17:42:46 [conn486] insert refinance.refinancebase20101123014059 110ms
Mon Nov 22 17:42:46 [conn486] run command refinance.$cmd

{ getlasterror: 1 }

Mon Nov 22 17:42:46 [conn486] query refinance.$cmd ntoreturn:1 command:

{ getlasterror: 1 }

reslen:81 0ms
Mon Nov 22 17:42:46 [conn486] run command refinance.$cmd { reset[joseph.wang@stage1.vpc3 logs]$



 Comments   
Comment by Eliot Horowitz (Inactive) [ 23/Nov/10 ]

Not entirely sure what I'm looking at.

Generally if a process dies without a message, something is in /var/log/messages
can you look here.

Also - can you give some info on your deployment setup (os, virtualized, etc...) and the full logs.

Comment by Joseph Wang [ 23/Nov/10 ]

I think it could be due to running out of /db space.

[joseph.wang@stage1.vpc3 logs]$ df
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/sda1 10321208 7871088 1925832 81% /
none 3936020 0 3936020 0% /dev/shm
/dev/sdf 51606140 51606140 0 100% /db
/dev/sdg 103212320 968172 97001268 1% /apps
[joseph.wang@stage1.vpc3 logs]$ cd /db

Generated at Thu Feb 08 02:59:05 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.