[SERVER-3928] replSet initial sync pending for quite a long time Created: 22/Sep/11 Updated: 11/Jul/16 Resolved: 27/Sep/11 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | 2.0.0 |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Minor - P4 |
| Reporter: | Tony Hannan | Assignee: | Kristina Chodorow (Inactive) |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
| Participants: |
| Description |
|
A customer, Techlightenment, sent me attached log. He was bringing up a new replica set. He said he had all replicas running but there was a period of about 1 hour where the log just showed:
See attached log. |
| Comments |
| Comment by Kristina Chodorow (Inactive) [ 27/Sep/11 ] |
|
No problem |
| Comment by Kevin Sandom [ 27/Sep/11 ] |
|
Sounds great. Thanks for all your help Kirstina. I really appreciate it. |
| Comment by Kristina Chodorow (Inactive) [ 27/Sep/11 ] |
|
I'd say don't bother attaching the logs for now. If it happens again on ext4, please attach the logs from that. |
| Comment by Kevin Sandom [ 27/Sep/11 ] |
|
It seems my comment has gone missing. Perhaps I got interrupted and forgot to submit it. I saw Brendon yesterday and a change I made in the we'll-try-anything-phase was moving the data storage to ext3 from ext4. It sounds like the pre-allocation of space is a major issue. If you're happy that this is the cause, you are welcome to close the ticket. Or better yet, I am likely to migrate all the boxes back to ext4 (and this time on raid10) in the next few weeks. After that I'll likely be creating an empty replica set for a new shard. This would be an excellent test to find out if this is the sole issue. Would you still like the logs from the other boxes anyway? |
| Comment by Kristina Chodorow (Inactive) [ 26/Sep/11 ] |
|
I'm a little confused... what is the IP of the server whose log you sent? I need the logs from 237 and 131, neither of which should be arbiters. |
| Comment by Kevin Sandom [ 24/Sep/11 ] |
|
Oh btw! I simply grabbed roughly the same timeline as the original log snippet I posted. I'm happy to supply more of either if it helps. |
| Comment by Kevin Sandom [ 24/Sep/11 ] |
|
Nice. I've attached the arbiter log as requested. I'll be interested to hear what you think.
|
| Comment by Kristina Chodorow (Inactive) [ 23/Sep/11 ] |
|
No, starting with no data is fine. You should be able to start all the members empty, initiate one of them, and have the set start working. It looks like you shut down the first member before the others had a chance to fully initialize, which got them stuck or something (which shouldn't happen). Hopefully the logs will make the timeline clearer. |
| Comment by Kevin Sandom [ 23/Sep/11 ] |
|
Sure, I'll dig up some logs tomorrow. I was trying to create a vanilla replicaset from nothing. So I had no DB to begin with. > You have to have a member in PRIMARY or SECONDARY state for another member to sync from. |
| Comment by Kristina Chodorow (Inactive) [ 23/Sep/11 ] |
|
The underlying problem seems to be that 10.250.117.237 and 10.230.3.131 were in STARTUP2 state. Do you guys have the logs for these servers? It looks like there wasn't any member it could sync from: Thu Sep 22 01:18:04 [rsHealthPoll] replSet info member 10.234.78.94:27017 is up You have to have a member in PRIMARY or SECONDARY state for another member to sync from. |