[SERVER-13117] Secondary repeatedly fails to sync Created: 10/Mar/14 Updated: 22/Oct/15 Resolved: 14/Mar/14 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | 2.4.5, 2.4.6, 2.4.7, 2.4.8, 2.4.9 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Blocker - P1 |
| Reporter: | Ryan Witt | Assignee: | Eric Milkie |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Operating System: | ALL | ||||||||
| Participants: | |||||||||
| Description |
|
I've gotten this error many tens of times on 2.4.x Symptoms:
Setup
The slightly anonymized log below contains the moment where the initial sync fails. The primary is m5. The secondary trying to sync is m4. There is some traffic on the servers, which causes slow responses. Connection spam is filtered out.
What more can I share that would help debug this problem? Does it seem like initial sync is sensitive to high load? |
| Comments |
| Comment by Eric Milkie [ 13/Mar/14 ] | ||||||||||||||||||||||
|
1) You're totally right about the docs; I edited that page and pushed a fix just now to the 2.6 docs repo (it will be published to the web later). | ||||||||||||||||||||||
| Comment by Ryan Witt [ 13/Mar/14 ] | ||||||||||||||||||||||
|
Yes, looks like it is on secondaries right now. Thanks! Couple more questions: 1) Is this behavior documented somewhere? This didn't matter as much at smaller data volumes, but it would be really helpful to know for big replica sets. Maybe it could be added to http://docs.mongodb.org/manual/core/replica-set-sync/, where I got the impression that there would only be one round of index building. The fragility of initial sync is a little disappointing, but looks like system snapshotting tools are still the tried-and-true database workaround. | ||||||||||||||||||||||
| Comment by Eric Milkie [ 13/Mar/14 ] | ||||||||||||||||||||||
|
Is it building secondary indexes? _id indexes are built as the data is copied, but the secondary indexes are built in a later phase after catching up on the oplog first. Once that's done, it will catch up on the oplog again and then deem the node ready to transition to SECONDARY state. | ||||||||||||||||||||||
| Comment by Ryan Witt [ 13/Mar/14 ] | ||||||||||||||||||||||
|
So, looks like keeping the load low lets the initial sync happen. m4 got through a round of indexing, then caught up on it's oplog.... then went into another round of indexing, which I don't recall happening before. It is still indexing, which might mean it's in some kind of indexing loop. Is this expected? Should I open another ticket with this behavior? | ||||||||||||||||||||||
| Comment by Eric Milkie [ 11/Mar/14 ] | ||||||||||||||||||||||
|
That's correct; the cloner uses a long-running query to scan through all the documents in each collection in natural order. If the query gets interrupted, it doesn't currently have a way to pick up where it left off. We're looking at ways to make initial sync better, where keeping one TCP connection alive for a long time won't be as critical. | ||||||||||||||||||||||
| Comment by Ryan Witt [ 11/Mar/14 ] | ||||||||||||||||||||||
|
Looks like keepalive was at the usual 7200 (had not persisted after reboots). Also played with cutting down TIME_WAIT. Neither seemed to help. However, I did try changing readahead, and very low values (32) on the primary seems to make it much more stable while syncing. We'll see by tomorrow morning if it helps m4 complete resyncing. So, just to confirm, connection interruptions can keep sync from completing? | ||||||||||||||||||||||
| Comment by Eric Milkie [ 11/Mar/14 ] | ||||||||||||||||||||||
|
I went through the log for m5. Based on the socket timeout error message from the secondary, I would surmise that the primary is too busy to be able to support the long-lasting connections that are necessary for initial sync to complete. Can you check your tcp keepalive setting? We recommend 300 if you haven't already adjusted this from the default. | ||||||||||||||||||||||
| Comment by Ryan Witt [ 10/Mar/14 ] | ||||||||||||||||||||||
|
Thanks, Eric. I'm always running mongostat, and I don't think m5 was near 20k. Here's the ulimit on mongod:
There is a readahead warning on m5, but this replset's access pattern usually benefits from large reads:
I'll go grab that log. | ||||||||||||||||||||||
| Comment by Eric Milkie [ 10/Mar/14 ] | ||||||||||||||||||||||
|
Can you provide a log from m5.example.com:27017 for the time that matches the log above? (Don't filter out the connection attempts). It looks like m5.example.com is having trouble keeping a steady connection to that machine (and m8.example.com as well). While load can slow things down, it shouldn't block connection attempts unless you are hitting a resource limit. Can you also provide the output from "uname -a", assuming you are running mongod on Linux. Are there any warnings emitted into the server log at mongod startup from m5.example.com? |