[SERVER-10141] MongoDB secondary member exception status with error 'replSet source for syncing doesn't seem to be await capable' Created: 09/Jul/13  Updated: 11/Jul/16  Resolved: 29/Jul/13

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Critical - P2
Reporter: jameszhou Assignee: Unassigned
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: Text File Error Messages.txt    
Operating System: ALL
Participants:

 Description   

Background:

1) I installed below 2 rpm packages on CentOS 6.3(Final) 64bits
mongo-10gen-2.2.1-mongodb_1.x86_64.rpm
mongo-10gen-server-2.2.1-mongodb_1.x86_64.rpm
2) A replicate Set deploy as followings:
Machine1:one primary,one arbiter
Machine2:one secondary
Set a shard cluster for this replicate set
3) The network interface of Machine 2(secondary member resided) has been outage for some reason for 2 days(This is likely to casue the machine2 to lose communction with others members). Then I fixed the network issue later.

Symptom:

When run rs.status() on mongo primary, find secondary exception status with error message 'db exception in producer: 1000 replSet source for syncing doesn't seem to be await capable – is it an older version of mongodb?'

Dump message for details: rs1:PRIMARY> rs.status()
{
"set" : "rs1",
"date" : ISODate("2013-07-08T10:25:31Z"),
"myState" : 1,
"members" : [

{ "_id" : 0, "name" : "app1_ss_nc:27017", "health" : 1, "state" : 2, "stateStr" : "SECONDARY", "uptime" : 2151, "optime" : Timestamp(1372928780000, 2), "optimeDate" : ISODate("2013-07-04T09:06:20Z"), "lastHeartbeat" : ISODate("2013-07-08T10:25:30Z"), "pingMs" : 0, // error "errmsg" : "db exception in producer: 1000 replSet source for syncing doesn't seem to be await capable -- is it an older version of mongodb?" }

,

{ "_id" : 1, "name" : "app2_ss_nc:27017", "health" : 1, "state" : 1, "stateStr" : "PRIMARY", "uptime" : 10543, "optime" : Timestamp(1373277727000, 1), "optimeDate" : ISODate("2013-07-08T10:02:07Z"), "self" : true }

,

{ "_id" : 2, "name" : "storage_ss_nc:27027", "health" : 1, "state" : 7, "stateStr" : "ARBITER", "uptime" : 2409, "lastHeartbeat" : ISODate("2013-07-08T10:25:30Z"), "pingMs" : 0 }

],
"ok" : 1
}

How can I fix this problem?



 Comments   
Comment by jameszhou [ 30/Jul/13 ]

Thanks Stephen for your reference and suggestion.

Comment by Stennie Steneker (Inactive) [ 29/Jul/13 ]

Hi,

The MongoDB manual includes tutorials for how to Resync a Member of a Replica Set or to Deploy a Replica Set if you would like to start from scratch.

As your questions have been addressed, I'm going to close this issue.

I would also note that the SERVER project is for reporting bugs or feature suggestions for the MongoDB server.

For MongoDB-related support discussion you should post on the mongodb-users group (http://groups.google.com/group/mongodb-user) or Stack Overflow.

Thanks,
Stephen

Comment by jameszhou [ 11/Jul/13 ]

Hi Dan,
If I can do a full resync from primary to secondary node by copy primary data files to secondary node to restore? I want to retain data in mongodb...
On the other hand, I am newbie and how can I re-create my replica set from scratch using the data from primary node,please give me a simple sample.
Thanks for millions!

Comment by Daniel Pasette (Inactive) [ 11/Jul/13 ]

it's not possible to reconstruct exactly what happened to this set based on those log snippets.

It is clear from those messages though that your oplog is no longer valid. See:

Thu Jul 11 11:17:54 [conn32885] query local.oplog.rs query: { ts: { $gte: new Date(5896684209837178882) } } ntoreturn:0 keyUpdates:0 exception: BSONElement: bad type 98 code:10320 numYields: 2104 locks(micros) r:231828 reslen:70 118ms

Your best bet at this point might be to re-create your replica set from scratch using the data from your primary node.

Comment by jameszhou [ 11/Jul/13 ]

Error Message Attached

Comment by jameszhou [ 11/Jul/13 ]

Thanks Dan for your reply!

I checked the primary log and grip a snippet of logs as per your suggestion. Attached it.Hopefully, I can get sort of fix solution.Thanks again!

Comment by Daniel Pasette (Inactive) [ 10/Jul/13 ]

This indicates there is something wrong with querying the oplog on the primary – there could be a couple different reasons. Can you check your primary log file to see if there are any messages indicating a problem or compress and post your log files to this ticket?

Generated at Thu Feb 08 03:22:23 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.