[SERVER-37468] Problem with initial sync when adding an member in a rs in 3.2.21 Created: 04/Oct/18 Updated: 28/Dec/18 Resolved: 28/Dec/18 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Guillaume LAPOUGE | Assignee: | Kelsey Schubert |
| Resolution: | Duplicate | Votes: | 2 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||
| Operating System: | ALL | ||||||||||||||||
| Participants: | |||||||||||||||||
| Case: | (copied to CRM) | ||||||||||||||||
| Description |
|
Hi, Since i migrate an repliset in mongodb 3.2.21 i can't add an member because the initial sync not works. It's seems to be an bug corrected in 3.2.21. But it seems not. In logs in the member i try to add i see this messages :
|
| Comments |
| Comment by Kelsey Schubert [ 18/Dec/18 ] | |||
|
Hi all, Given the impact of this issue and its recent introduction to the 3.2 series, we've resolved this issue under Please note that MongoDB 3.2 continues to be EOL, and future patch releases will not provided. Therefore, we strongly recommend that users on 3.2 upgrade to more recent and supported versions of MongoDB. Thanks again, | |||
| Comment by Dave Muysson [ 27/Nov/18 ] | |||
|
Add us to the list of affected users as well. We upgrade two of our large clusters to 3.2.21 recently while we make our way towards 3.4 and then ultimately 3.6. We hit this rebuilding a node in our production environment. It is very unfortunate this case was closed. If there is no desire to investigate of fix this, I would recommend pulling 3.2.21 entirely due to this issue. Experiencing this behaviour right when you are trying to recover a production node is very unpleasant... We will be rolling back to 3.2.20 across the board as we do not want to risk being unable to recover a failed node in our production environment. | |||
| Comment by Hailin Hu [ 23/Oct/18 ] | |||
|
I met similar issue and downgraded to 3.2.20, which works.
| |||
| Comment by Kelsey Schubert [ 19/Oct/18 ] | |||
|
Please note that MongoDB 3.2 has reached "end of life", and we have not heard any reports of issues like this on more recent versions of MongoDB. Therefore my recommendation would be to upgrade to a more recent version of MongoDB. If you continue to encounter this issue after upgrading please let us know and we will continue to investigate. Thank you, | |||
| Comment by Daniel Z. [ 11/Oct/18 ] | |||
|
Hi Guillaume LAPOUGE
Tip. Restore your "mongod.conf" file after installation. Good Luck! Regards, Danny | |||
| Comment by Guillaume LAPOUGE [ 09/Oct/18 ] | |||
|
Hi, I have upload log files for both nodes. In the sf06-b log file you can see the sf06-c node isn't online, it's just time to start the mongod service in sf06-c. | |||
| Comment by Nick Brewer [ 08/Oct/18 ] | |||
|
glapouge Can you provide the corresponding logs as well? You can upload them to our secure portal If you'd prefer not to make them publicly accessible. -Nick | |||
| Comment by Guillaume LAPOUGE [ 08/Oct/18 ] | |||
|
diagnostics data files are posted. | |||
| Comment by Nick Brewer [ 08/Oct/18 ] | |||
|
glapouge Thanks. To determine the cause of the slowness, we'll need a complete set of logs from both of the mongods when the failed initial sync is occurring, as well as archives (tar or zip) of the dbpath/diagnostic.data directory on each of the nodes. -Nick | |||
| Comment by Guillaume LAPOUGE [ 08/Oct/18 ] | |||
|
When testing with mtr between nodes B and C i not detect issue in network performance.
HOST: sf06-b.essos.lan Loss% Snt Last Avg Best Wrst StDev HOST: sf06-c.essos.lan Loss% Snt Last Avg Best Wrst StDev
| |||
| Comment by Nick Brewer [ 05/Oct/18 ] | |||
|
glapouge The connection is initially successful, but it appears to be quite slow:
Given that these machines appear to be on a the same local network, a connection time of 2023ms seems unusual. You can use a tool like mtr to determine the average response time from the affected node to the primary and secondary. -Nick | |||
| Comment by Guillaume LAPOUGE [ 05/Oct/18 ] | |||
|
I also try to change the sync member in the replicaset from B (secondary) to A (primary) i have the same issue. | |||
| Comment by Guillaume LAPOUGE [ 05/Oct/18 ] | |||
|
The tcp 27017 port is totally open. We already have A and B nodes in this rs and the replication works perfectly between this nodes. The C want to have initial sync. The first log in the bug declaration is from the node C. The node C want to synchronise with the node B. This is an extract from the node B
2018-10-05T10:03:09.600+0200 I ASIO [NetworkInterfaceASIO-Replication-0] Connecting to sf06-c.essos.lan:27017
The node C was correctly detected.
| |||
| Comment by Nick Brewer [ 04/Oct/18 ] | |||
|
glapouge This is often the result of a connectivity issue - have you tested connectivity between the primary and this node? It would be useful to see the mongod logs from both the primary and the node you were attempting to add. If you'd prefer to keep this information private, we can generate a secure portal for you to upload it to. -Nick |