[SERVER-51721] dataSize do not reduce after chunks migrated Created: 18/Oct/20  Updated: 05/Nov/20  Resolved: 05/Nov/20

Status: Closed
Project: Core Server
Component/s: Storage
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Dheeraj G Assignee: Edwin Zhou
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: Text File C_C-shardDistribution-10-21-2020.txt     Text File C_ERROR-shardDistribution-10-21-2020.txt     Text File C_E_C-shardDistribution-10-21-2020.txt     Text File C_RETRY-shardDistribution-10-21-2020.txt     Text File script-cleanupOrphaned.txt     Text File sh-status()_10-24_1700CT.txt     File sh.status()-10-21-2020    
Operating System: ALL
Participants:

 Description   

Hi,

Recently I have noticed a issue after upgrading from 4.0.14 to 4.0.20

When using Mongo version 4.0.14, I have added 1 shard on top of existing 2 shards. turnedOn the balancer and chunks migrated to newly added shard (3rd) and I could see dataSize reduced on older shards, by using "compact" I reclaimed storageSize after migration.

 

After upgrading to 4.0.20, I have added 1 more shard on top of 3 shards, turnedOn the balancer and chunks migrated to newly added shard (4th) and I did not notice dataSize reduced on older shards, I also verified by looking at "file bytes available for reuse" on db.stats() and db.coll.stats() 



 Comments   
Comment by Edwin Zhou [ 04/Nov/20 ]

Hi dheeraj.dba7@gmail.com ,

Thank you for providing updates to your issue! I hope your continued investigation in the community forums have helped you understand what to expect in chunk migrations. I'm going to close this ticket as we've redirected your issue to the MongoDB Developer Community Forums, but if your investigation leads you to believe that you've run into a bug, we can continue that discussion in the SERVER project.

Best,

Edwin

Comment by Dheeraj G [ 24/Oct/20 ]

Hi Edwin,

Since you have looked at this issue, I am sharing my observations after I added 5th Shard, data has been balanced after adding new shard but, I do not observe any change in dataSize reduce on "rs-qa-c_0" where as "rs-qa-c_2" holds almost similar dataset in terms of documents count and chunks but, "rs-qa-c_2" is 1/4th size of "rs-qa-c_0" shard.

---------------------------------------------------------------------------

MongoDB Enterprise mongos> db.C_C.getShardDistribution()

Shard rs-qa-c_2 at rs-qa-c_2/dc616512.domain:27017,dc616513.domain:27017
data : 308.39GiB docs : 10716176 chunks : 52479
estimated data per chunk : 6.01MiB
estimated docs per chunk : 204

Shard rs-qa-c_4 at rs-qa-c_4/dc1008178.domain:27017,dc1008211.domain:27017
data : 495.43GiB docs : 4879467 chunks : 52026
estimated data per chunk : 9.75MiB
estimated docs per chunk : 93

Shard rs-qa-c_3 at rs-qa-c_3/dc1008002.domain:27017,dc1008003.domain:27017
data : 555.89GiB docs : 5315829 chunks : 52470
estimated data per chunk : 10.84MiB
estimated docs per chunk : 101

Shard rs-qa-c_0 at rs-qa-c_0/dc615353.domain:27017,dc615354.domain:27017
data : 1206.54GiB docs : 11328011 chunks : 52465
estimated data per chunk : 23.54MiB
estimated docs per chunk : 215

Shard rs-qa-c_1 at rs-qa-c_1/dc615355.domain:27017,dc615356.domain:27017
data : 316.28GiB docs : 3051536 chunks : 52456
estimated data per chunk : 6.17MiB
estimated docs per chunk : 58

Totals
data : 2882.54GiB docs : 35291019 chunks : 261896
Shard rs-qa-c_2 contains 10.69% data, 30.36% docs in cluster, avg obj size on shard : 30KiB
Shard rs-qa-c_4 contains 17.18% data, 13.82% docs in cluster, avg obj size on shard : 106KiB
Shard rs-qa-c_3 contains 19.28% data, 15.06% docs in cluster, avg obj size on shard : 109KiB
Shard rs-qa-c_0 contains 41.85% data, 32.09% docs in cluster, avg obj size on shard : 111KiB
Shard rs-qa-c_1 contains 10.97% data, 8.64% docs in cluster, avg obj size on shard : 108KiB

-----------------------------------------------------------------------------------------------------

Comment by Dheeraj G [ 22/Oct/20 ]

Hi Edwin,

Sure, I am continuously investigating on it, also I am adding 2 more shards, will keep you posted after data is balanced. And I am not sure at this moment if it's really because of either of the versions (4.0.14, 4.0.20) behavior. 

Meanwhile as you mentioned I will also reach out on MongoDB Developer Community Forums

Thanks,

Dheeraj

Comment by Edwin Zhou [ 22/Oct/20 ]

Hi dheeraj.dba7@gmail.com,

One thing we can add at this point is that in 4.2 we moved the auto-splitter to run on the shard primary (SERVER-9287), which improves chunk splits. If you can upgrade to 4.2 you should see more predictable chunk split behavior as a result.

The getShardDistribution results suggest that the additional data size on your c_0 shard is explainable by more documents and data being located there. Because of this, we aren't able to easily reason about whether a bug is involved here, and we aren't aware of any changes between 4.0.14 and 4.0.20 that would influence split or migration behavior.

As such, we'd like to suggest you investigate which chunks are larger, and why they may be larger. The best place to start if you are unsure will be to reach out to our community by posting on the MongoDB Developer Community Forums. Should your investigation lead you to suspect a more specific bug, we could investigate further here in the SERVER project.

Best,

Edwin

Comment by Dheeraj G [ 22/Oct/20 ]

Hi Edwin,

I haven't tried compact, but after reporting I tried initial sync followed by secondary and primary on "rs-qa-C_0" shard.

  • Performed cleanupOrphaned. (attached script for reference)
  • Provided sh.status() and db.collectionName.getShardDistribution() (attached results)
  • FYI, I downgraded to 4.0.14 yesterday to perform tests, so you'll see 4.0.14 in sh.status(). Also, I thought this is an issue after noticing dataSize on "rs-qa-C_0" shard

 

Thanks,

Dheeraj

 

 

Comment by Edwin Zhou [ 21/Oct/20 ]

Hi dheeraj.dba7@gmail.com,

Thanks for providing us this information. Did you run compact the second time you added a shard?

Keep in mind that it's not always expected that space on disk is affected by movements of chunks between shards. An unchanged space on disk could be a result of a number of things.

The compact operation depends on your workload and its effectiveness may vary. You may not see any reduction to space on disk as a result of running this operation.

Another possibility may be when when the chunks migrate over to the new shard, the chunks in the origin shard will be removed asynchronously. It's possible that when you checked the size of the shard, that process may not have completed.

It's also possible that the migration may have moved empty chunks which would not have any impact on dataSize.

We would need to significantly narrow down what has occurred to determine if this is a bug or not.
To help us, can you:

Best,

Edwin

Generated at Thu Feb 08 05:26:14 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.