[SERVER-24948] initial sync failed because listDatabases exceeded 30s socket (read) timeout Created: 08/Jul/16  Updated: 06/Dec/22  Resolved: 28/Sep/16

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: 3.2.3
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Zhang Youdong Assignee: Backlog - Replication Team
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
duplicates SERVER-3181 Add option to listDatabases to only g... Closed
Related
related to SERVER-28924 Change databasesCloner to use the "na... Closed
Assigned Teams:
Replication
Operating System: ALL
Steps To Reproduce:

create many collections and indexes to make 'listDatabases' cost more than 30s, then do initial sync.

Participants:

 Description   

We encounter a user case with 60,000+ collections(300,000+ files including index) using wiredtiger engine, listDatabases will cost more than 30s in this case because it need to traverse all the wt file to get the size stat.

Secondary set socket timeout to 30s during sync process, so it failed to run listDatabases command in this case. (OplogReader::kSocketTimeout(30)

2016-06-27T20:59:46.494+0800 I REPL     [rsSync] ******
2016-06-27T20:59:46.495+0800 I REPL     [rsSync] initial sync pending
2016-06-27T20:59:46.499+0800 I REPL     [rsSync] no valid sync sources found in current replset to do an initial sync
2016-06-27T20:59:47.499+0800 I REPL     [rsSync] initial sync pending
2016-06-27T20:59:47.517+0800 I REPL     [rsSync] initial sync drop all databases
2016-06-27T20:59:47.517+0800 I STORAGE  [rsSync] dropAllDatabasesExceptLocal 1
2016-06-27T20:59:47.517+0800 I REPL     [rsSync] initial sync clone all databases
2016-06-27T21:00:17.517+0800 I NETWORK  [rsSync] Socket recv() timeout  10.182.4.106:27017
2016-06-27T21:00:17.517+0800 I NETWORK  [rsSync] SocketException: remote: (NONE):0 error: 9001 socket exception [RECV_TIMEOUT] server [10.182.4.106:27017] 
2016-06-27T21:00:17.519+0800 E REPL     [rsSync] 6 network error while attempting to run command 'listDatabases' on host '10.182.4.106:27017' 
2016-06-27T21:00:17.519+0800 E REPL     [rsSync] initial sync attempt failed, 9 attempts remaining
2016-06-27T21:00:22.519+0800 I REPL     [rsSync] initial sync pending
2016-06-27T21:00:40.435+0800 I REPL     [rsSync] initial sync drop all databases
2016-06-27T21:00:40.435+0800 I STORAGE  [rsSync] dropAllDatabasesExceptLocal 1
2016-06-27T21:00:40.435+0800 I REPL     [rsSync] initial sync clone all databases
2016-06-27T21:01:10.436+0800 I NETWORK  [rsSync] Socket recv() timeout  10.182.4.106:27017

During initial sync, the secondary only need to get the db names, it will not care the db size information, so we can add an option when listDatabases to tell the server "only db names are needed", this will decrease the listDatabases cost a lot.

db.runCommand({listDatabases: 1, nameOnly: true})



 Comments   
Comment by Zhang Youdong [ 13/Jul/16 ]

At the very beginning, I just want to slove the initial sync timeout problem, though it has many other ways to solve, like belows:

  • set longer or no timeout on socket
  • use mmapv1 engine on primary
  • copy raw data from primary when add a new empty node

So I just considered the compatible things, after see your tips, I have rethink this change to become a general feature, and and I will discuss it on SERVER-3181.

Comment by Scott Hernandez (Inactive) [ 11/Jul/16 ]

Great and thanks for the answers. We want to improve initial sync in terms of performance and reliability for this release. I look forward to when the new code will be testable, and will let you know when that is.

Pull Request

The pull request you started was a good beginning but there are a few more things that would be need to be addressed before we could accept it to be merged into the server. Here is a brief list of some things off the top of my head, before looking too closely at the code:

  • Design the change
    • Decide on output format (what should the size be, or should it even be included)
    • Name the parameter, and ensure forward and backwards compatibility (for parsing and validation)
    • Verify usage in the server (DBClient, cloner and other places)
    • Evaluate security ramifications (I don't think there are any, but need to think about it)
    • Will there be shell support or helpers (should we change to make this the default)?
  • Add sharding support (including tests)
    • How does this affect the sharded command implementation?
    • Are there are any other uses of the command in sharding that need to changed?
  • Research effect on driver/client (library)
    • Will the design break existing clients?
    • Is there enough information in the output, or does the output result in any problems?
  • Research effect on tools and integrations (may not be more than needed above)
  • Implement on master for the pull request

If you are interested in continuing this work it would be good to move this discussion to SERVER-3181 to address the above issues before doing anything more with the code.

Comment by Zhang Youdong [ 11/Jul/16 ]

I create a pull request, see https://github.com/mongodb/mongo/pull/1100, and I will use it in our production environment.

Comment by Zhang Youdong [ 10/Jul/16 ]

Hi, Scott Hernandez, it's the same problem with SERVER-3181.

1. The first inital sync succeed when there is no data, this happened when we add a new empty node. (I know we can also add a node by copy the data from primary, and do incremental sync, but we want to do less ops work.)
2. We modified some source code, like online resize oplog size, see SERVER-22847, but didn't modify the sync logic.
3. A similar user case(many collections) use mmapv1, it doesn't have this problem, because listDatabases in mmapv1 cost much less than wiredtiger.

And I am glad to test Mongodb-3.4 when it's available on my dataset, you can contact me at any time through jira or email(zyd_com@126.com).

Comment by Scott Hernandez (Inactive) [ 08/Jul/16 ]

We have an existing issue (SERVER-3181) to removed the need to calculate the size on listDatabases, which is now linked.

I have a few questions about your case:

  • Was the initial sync able to complete after the first attempt?
  • Has the mongod source code been modified that you are using? If so, how?
  • Are you using only wired tiger, or mmapv1 as well; and if both, did mmapv1 also have the same problem?

I believe the new initial sync code we are working on for 3.4 will remove the socket timeout so that the operation will succeed. Would you be willing to test this, once it is available in alpha/beta, on your data sets?

Generated at Thu Feb 08 04:07:51 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.