[SERVER-45265] Incosistant sharding overhead Created: 20/Dec/19  Updated: 27/Oct/23  Resolved: 22/Dec/19

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: None

Type: Question Priority: Major - P3
Reporter: Moditha Hewasinghage Assignee: Dmitry Agranat
Resolution: Community Answered Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File configserver.conf     PNG File image-2019-12-20-01-14-53-779.png     Zip Archive logs.zip     File mongos.conf     File shard1.conf     File shard2.conf     File shard3.conf    
Backwards Compatibility: Fully Compatible
Participants:

 Description   

I ran a few experiments to compare sharding vs non-sharded mongo instances. I have limited the mongod memory to 256MB in both scenarios and disabled compression. I use a synthetic dataset of 80-byte documents and change the number of documents of the collections and collect the average runtime to retrieve a random document by id. In the sharded environment I have ranged sharding on the id and the data is equally distributed among the 3 servers (no replication). Here is the result that I got. (80-2m means 2,000,000 documents of 80 bytes)

As expected there is an overhead associated with sharding. However, my question is that shouldn't this overhead be a constant? why does the difference between the runtime of the sharded and the non-sharded instance is getting more with more documents? As far as I can see this overhead should be independent of the document counts.

I have attached all the logs of each experiment ( each collection used different db locations) of the config, mongods and the mongos. The configurations I used is also attached



 Comments   
Comment by Moditha Hewasinghage [ 23/Dec/19 ]

@Dimitry Agranat I think you misunderstood the graph. This bottleneck is the memory when the indexes do not fit in memory. Regardless of the value in 64 million have a look at the graph until 32 million. the gap between sharded and non-sharded at 2 million is way less than when it is 32 million. If the sharding has a constant overhead this should not happen.

Comment by Dmitry Agranat [ 22/Dec/19 ]

modithadha88@gmail.com, based on your graph, it looks like you are hitting some bottleneck around 32 million docs, both on sharding and non-sharded instances. So the reported "overhead" looks consistent between both configurations.

The SERVER project is for bugs and feature suggestions for the MongoDB server. As this ticket does not appear to be a bug, I will now close it. If you need further assistance troubleshooting, I encourage you to ask our community by posting on the mongodb-user group or on Stack Overflow with the mongodb tag.

Generated at Thu Feb 08 05:08:21 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.