[SERVER-24285] MongoDB Initial Sync failing repeatedly Created: 25/May/16  Updated: 31/May/16  Resolved: 31/May/16

Status: Closed
Project: Core Server
Component/s: Index Maintenance, Networking, Replication
Affects Version/s: 3.2.3
Fix Version/s: None

Type: Bug Priority: Critical - P2
Reporter: Abhishek Vaid Assignee: Unassigned
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File Screen Shot 2016-05-25 at 6.15.58 PM.png    
Operating System: ALL
Steps To Reproduce:

Try and replicate a big >200GB DB with many big documents and 4,5 secondary indices.

Participants:

 Description   
  • I have a one member mongod instance (Server1) with a very large DB with 2.5 million documents (Each document is very big) and 4 indices.
  • Then I added another machine (Server2)to this replica set. Mongod on Server2 takes about 5 hours to fetch all the documents in this big database.
  • After all the documents are fetched by Server2, It starts making secondary indices. It takes around 3 hours for indices to get finished.
  • Immediately after completing building the secondary indices, It tries to connect to primary and finds that socket is expired and timed out.
  • At having received a timeout error it (Server2) simply drops all databases and starts the initial sync again.

Logs are attached as an image.



 Comments   
Comment by Kelsey Schubert [ 31/May/16 ]

Hi vaidabhishek,

I would recommend ensuring that TCP keepalives are working and are frequent enough. Please note that SERVER project is for reporting bugs or feature suggestions for the MongoDB server. For MongoDB-related support discussion please post on the mongodb-users group or Stack Overflow with the mongodb tag. A question like this involving more discussion would be best posted on the mongodb-users group. See also our Technical Support page for additional support resources.

Additionally, we have work scheduled in SERVER-8076 that would mitigate this issue and make it less important to have properly configured TCP keepalives. Please feel free to vote for it and watch it for updates.

Kind regards,
Thomas

Comment by Abhishek Vaid [ 25/May/16 ]

Sorry for typos, I have typed this from a phone.

Generated at Thu Feb 08 04:05:54 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.