[SERVER-57776] Sharded collections becomes inaccessible when it becomes to big Created: 17/Jun/21 Updated: 19/Jul/21 Resolved: 19/Jul/21 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Nicolai Ødum | Assignee: | Eric Sedor |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Operating System: | ALL |
| Steps To Reproduce: | Create a sharded setup where the config.chunks collection contains 40 million entries and half of them relates to one single collection. Try to start a MongoS |
| Participants: |
| Description |
|
I have a large sharded (+500TB) collection that right now is inaccessible because of a timeout in the synchronization between MongoC and MongoS. Both MongoS and MongoC are run on enterprise class servers with 10Gbit network with <0.1ms latency. The Shareded collection has +20 million entries in the config.chunks collection on the MongoC - and the total number of entries in config.chunks collection is +40 million. When the MongoS starts there is a (hardcoded?) limit of 1 min for each collection to sync config.chunks from the MongoC to the MongoS...And if it fails the MongoS will not start at all. Is there a way to change that timeout in the MongoS? |
| Comments |
| Comment by Eric Sedor [ 19/Jul/21 ] |
|
Thanks for clarifying nicolai@niro-it.dk. We do understand this is sensitive information and appreciate your care. Unfortunately, without logs or diagnostic data, we aren't able to investigate this report here in the SERVER project. But we will be on the lookout for similar reports. If you do end up able to provide logs showing the mongos startup failure, let us know and we can reopen this ticket. Sincerely, |
| Comment by Nicolai Ødum [ 15/Jul/21 ] |
|
Sorry - I am not able to provide you with un-obfucated logs - I have used https://github.com/rueckstiess/fruitsalad but I am not sure if it can handle the new json format.
Regards Nicolai |
| Comment by Eric Sedor [ 15/Jul/21 ] |
|
Are you able to provide un-obfuscated logs? We are definitely interested in investigating the details of what you're reporting. Sincerely, |
| Comment by Eric Sedor [ 28/Jun/21 ] |
|
It looks like the logs have been fully obfuscated. Are you at all able to provide either partially obfuscated or un-redacted versions of these logs to the same upload portal? To clarify, files uploaded here will only be visible to MongoDB employees actively involved in this investigation. If that's not possible, could you provide manually redacted lines that preserve the system-related information in each line? We're particularly interested in the log messages that are occurring on the mongos and config server primary at the time the mongos is failing to start. Gratefully, |
| Comment by Nicolai Ødum [ 18/Jun/21 ] |
|
I have uploaded mongos and a mongoc log. Because of company policy I am not able to upload binary files. |
| Comment by Nicolai Ødum [ 18/Jun/21 ] |
|
OS: CentOS Linux release 7.9.2009 MongoDB --version Build Info: { }
|
| Comment by Eric Sedor [ 17/Jun/21 ] |
|
Hi nicolai@niro-it.dk, can you clarify the MongoDB version and provide some additional information? I've created a secure upload portal for you. Files uploaded to this portal are visible only to MongoDB employees and are routinely deleted after some time. We'd like information from the following nodes:
For each of these nodes spanning a time period that includes a failed restart attempt, would you please archive (tar or zip) and upload to that link:
Thank you, |