[SERVER-24948] initial sync failed because listDatabases exceeded 30s socket (read) timeout Created: 08/Jul/16 Updated: 06/Dec/22 Resolved: 28/Sep/16 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | 3.2.3 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Zhang Youdong | Assignee: | Backlog - Replication Team |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||
| Assigned Teams: |
Replication
|
||||||||||||||||
| Operating System: | ALL | ||||||||||||||||
| Steps To Reproduce: | create many collections and indexes to make 'listDatabases' cost more than 30s, then do initial sync. |
||||||||||||||||
| Participants: | |||||||||||||||||
| Description |
|
We encounter a user case with 60,000+ collections(300,000+ files including index) using wiredtiger engine, listDatabases will cost more than 30s in this case because it need to traverse all the wt file to get the size stat. Secondary set socket timeout to 30s during sync process, so it failed to run listDatabases command in this case. (OplogReader::kSocketTimeout(30)
During initial sync, the secondary only need to get the db names, it will not care the db size information, so we can add an option when listDatabases to tell the server "only db names are needed", this will decrease the listDatabases cost a lot.
|
| Comments |
| Comment by Zhang Youdong [ 13/Jul/16 ] |
|
At the very beginning, I just want to slove the initial sync timeout problem, though it has many other ways to solve, like belows:
So I just considered the compatible things, after see your tips, I have rethink this change to become a general feature, and and I will discuss it on |
| Comment by Scott Hernandez (Inactive) [ 11/Jul/16 ] |
|
Great and thanks for the answers. We want to improve initial sync in terms of performance and reliability for this release. I look forward to when the new code will be testable, and will let you know when that is. Pull RequestThe pull request you started was a good beginning but there are a few more things that would be need to be addressed before we could accept it to be merged into the server. Here is a brief list of some things off the top of my head, before looking too closely at the code:
If you are interested in continuing this work it would be good to move this discussion to |
| Comment by Zhang Youdong [ 11/Jul/16 ] |
|
I create a pull request, see https://github.com/mongodb/mongo/pull/1100, and I will use it in our production environment. |
| Comment by Zhang Youdong [ 10/Jul/16 ] |
|
Hi, Scott Hernandez, it's the same problem with SERVER-3181. 1. The first inital sync succeed when there is no data, this happened when we add a new empty node. (I know we can also add a node by copy the data from primary, and do incremental sync, but we want to do less ops work.) And I am glad to test Mongodb-3.4 when it's available on my dataset, you can contact me at any time through jira or email(zyd_com@126.com). |
| Comment by Scott Hernandez (Inactive) [ 08/Jul/16 ] |
|
We have an existing issue ( I have a few questions about your case:
I believe the new initial sync code we are working on for 3.4 will remove the socket timeout so that the operation will succeed. Would you be willing to test this, once it is available in alpha/beta, on your data sets? |