[SERVER-3928] replSet initial sync pending for quite a long time Created: 22/Sep/11  Updated: 11/Jul/16  Resolved: 27/Sep/11

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: 2.0.0
Fix Version/s: None

Type: Improvement Priority: Minor - P4
Reporter: Tony Hannan Assignee: Kristina Chodorow (Inactive)
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: Text File mongodb-arbiter.log     Text File mongodb.slowCreatedRS.log    
Participants:

 Description   

A customer, Techlightenment, sent me attached log. He was bringing up a new replica set. He said he had all replicas running but there was a period of about 1 hour where the log just showed:

Thu Sep 22 00:20:38 [rsSync] replSet initial sync pending
Thu Sep 22 00:20:38 [rsSync] replSet initial sync need a member to be primary or secondary to do our initial sync
...
Thu Sep 22 01:17:58 [rsSync] replSet initial sync pending
Thu Sep 22 01:17:58 [rsSync] replSet initial sync need a member to be primary or secondary to do our initial sync

See attached log.



 Comments   
Comment by Kristina Chodorow (Inactive) [ 27/Sep/11 ]

No problem

Comment by Kevin Sandom [ 27/Sep/11 ]

Sounds great. Thanks for all your help Kirstina. I really appreciate it.

Comment by Kristina Chodorow (Inactive) [ 27/Sep/11 ]

I'd say don't bother attaching the logs for now. If it happens again on ext4, please attach the logs from that.

Comment by Kevin Sandom [ 27/Sep/11 ]

It seems my comment has gone missing. Perhaps I got interrupted and forgot to submit it.

I saw Brendon yesterday and a change I made in the we'll-try-anything-phase was moving the data storage to ext3 from ext4. It sounds like the pre-allocation of space is a major issue.

If you're happy that this is the cause, you are welcome to close the ticket. Or better yet, I am likely to migrate all the boxes back to ext4 (and this time on raid10) in the next few weeks. After that I'll likely be creating an empty replica set for a new shard. This would be an excellent test to find out if this is the sole issue.

Would you still like the logs from the other boxes anyway?

Comment by Kristina Chodorow (Inactive) [ 26/Sep/11 ]

I'm a little confused... what is the IP of the server whose log you sent? I need the logs from 237 and 131, neither of which should be arbiters.

Comment by Kevin Sandom [ 24/Sep/11 ]

Oh btw! I simply grabbed roughly the same timeline as the original log snippet I posted. I'm happy to supply more of either if it helps.

Comment by Kevin Sandom [ 24/Sep/11 ]

Nice. I've attached the arbiter log as requested. I'll be interested to hear what you think.

  • Kevin
Comment by Kristina Chodorow (Inactive) [ 23/Sep/11 ]

No, starting with no data is fine. You should be able to start all the members empty, initiate one of them, and have the set start working. It looks like you shut down the first member before the others had a chance to fully initialize, which got them stuck or something (which shouldn't happen). Hopefully the logs will make the timeline clearer.

Comment by Kevin Sandom [ 23/Sep/11 ]

Sure, I'll dig up some logs tomorrow.

I was trying to create a vanilla replicaset from nothing. So I had no DB to begin with.

> You have to have a member in PRIMARY or SECONDARY state for another member to sync from.
I'm starting to think I'm doing it wrong, and I simply got away with it in 1.8.x. Do I have to have some local data on one node before creating the replica set?

Comment by Kristina Chodorow (Inactive) [ 23/Sep/11 ]

The underlying problem seems to be that 10.250.117.237 and 10.230.3.131 were in STARTUP2 state. Do you guys have the logs for these servers?

It looks like there wasn't any member it could sync from:

Thu Sep 22 01:18:04 [rsHealthPoll] replSet info member 10.234.78.94:27017 is up
Thu Sep 22 01:18:04 [rsHealthPoll] replSet member 10.234.78.94:27017 is now in state ARBITER
Thu Sep 22 01:18:04 [initandlisten] connection accepted from 10.49.121.85:60016 #2
Thu Sep 22 01:18:04 [initandlisten] connection accepted from 10.250.117.237:54658 #3
Thu Sep 22 01:18:05 [initandlisten] connection accepted from 10.234.78.94:44265 #4
Thu Sep 22 01:18:05 [initandlisten] connection accepted from 10.230.3.131:39078 #5
Thu Sep 22 01:18:06 [rsHealthPoll] replSet info member 10.49.121.85:27017 is up
Thu Sep 22 01:18:06 [rsHealthPoll] replSet member 10.49.121.85:27017 is now in state ARBITER
Thu Sep 22 01:18:06 [rsHealthPoll] replSet info member 10.250.117.237:27017 is up
Thu Sep 22 01:18:06 [rsHealthPoll] replSet member 10.250.117.237:27017 is now in state STARTUP2
Thu Sep 22 01:18:06 [rsHealthPoll] replSet info member 10.230.3.131:27017 is up
Thu Sep 22 01:18:06 [rsHealthPoll] replSet member 10.230.3.131:27017 is now in state STARTUP2
Thu Sep 22 01:18:06 [initandlisten] connection accepted from 127.0.0.1:39994 #6

You have to have a member in PRIMARY or SECONDARY state for another member to sync from.

Generated at Thu Feb 08 03:04:27 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.