[SERVER-39598] Unwanted High Network IO in MongoDB Shard Cluster Created: 15/Feb/19  Updated: 06/May/19  Resolved: 06/May/19

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Madura Dissanayake Assignee: Danny Hatcher (Inactive)
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File 2019-02-15_16-12-24.png     JPEG File INC-mongo-db-cluster-1.jpg    
Operating System: ALL
Participants:

 Description   

Hi,

I'm having a MogoDB Shard cluster in my production environment. The mongoDB shard cluster is being resided in AWS Cloud under EC2 Instances. Following diagram illustrates big picture of the MongoDB sharded Cluster.

According to the production environment, I'm maintaining a cluster environment including two shards with replica set.

 

Shard Master-Node Availability Zone
Shard-01 Replica-C us-east-a
Shard-02 Replica-A us-east-a
MongoS MongoS us-east-a

In this mongoDB which includes total around 30GB of data, but I do observe very high network in/out in between those two master nodes in the cluster and the Mongos server(Green coloured servers). This huge network traffic is being progressed with 25Mbps for continuous 10hours everyday, starting each day @5.05am UTC and it transfers around 2TB of data in total daily. This is massive cost hitting network traffic and now it's being terrible.

Following network traffic monitoring dashboard gives the clear idea of spikes.

Need to find out a solution immediately for this unwanted network traffic. Any help would be greatly appreciated.

Thanks.



 Comments   
Comment by Danny Hatcher (Inactive) [ 25/Apr/19 ]

madurad are you still experiencing this issue?

Comment by Madura Dissanayake [ 26/Feb/19 ]

Hi @Daniel,

Thanks for your response, as you stated 05.05 am UTC is the problem where we see and hour later is our backup time, @06.30 am UTC.

Additionally I'll enable default logging options for the cluster, and will upload all the required diagnostics logs soon, to check from your side.

Thanks.

 

Comment by Danny Hatcher (Inactive) [ 22/Feb/19 ]

Hello Madura,

From the shard Primaries, I can see a very large burst of documents being returned starting around 05:05UTC every day. This lasts for about 30 minutes and then an hour later another burst of 15 minutes happens before quieting down until the next day. Unfortunately, I am not sure what is happening the rest of the day as the mongos data would most likely be the most helpful thing there. I recommend using the built-in logging capabilities instead of redirecting the output as it would provide much more useful information in diagnosing issues.

Additionally, I see that you are using two config servers for your production environment. This is not recommended; you should add another config server as soon as possible.

If you can collect the mongos diagnostics covering one of these timeframes, please upload those (for all the mongos nodes) as well as the updated shard primary diagnostics. Unfortunately, without further mongos diagnostics, there is not much more I can check.

Thank you,

Danny

Comment by Madura Dissanayake [ 21/Feb/19 ]

HI @Daniel,

Thanks for the feedback to my ticket. As you requested all the required logs files were uploaded to the secure portal. But in my MongoDB cluster, I start mongos server using a following command, therefore diagnostic logs are not available in the server after that.

sudo mongos --configdb "replconfig01/prod-mongodb-config-01:27017,prod-mongodb-config-02:27017" --bind_ip "0.0.0.0"  &> /opt/mongodb-cluster/logs/mongodb-out.log &

 Please let me know, you need any further details.

Comment by Danny Hatcher (Inactive) [ 19/Feb/19 ]

Hello Madura,

Please upload the following to our Secure Upload Portal. Please note that only MongoDB engineers will be able to see the files uploaded there.

  • Shard Primaries mongod logs
  • Shard Primaries "diagnostic.data" folders
  • mongos logs
  • mongos "diagnostic.data" folders

Thanks,

Danny

Generated at Thu Feb 08 04:52:31 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.