[SERVER-20392] Sharding an existing small collection results in large number of chunks Created: 14/Sep/15  Updated: 28/Aug/18  Resolved: 09/Aug/17

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 3.0.6, 3.2.11, 3.4.0
Fix Version/s: 3.4.9, 3.5.12

Type: Bug Priority: Major - P3
Reporter: Yves Duhem Assignee: Kevin Pulo
Resolution: Done Votes: 5
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File reproduce.js    
Issue Links:
Backports
Related
is related to SERVER-30825 blacklist shard_existing_coll_chunk_c... Closed
is related to SERVER-30860 Re-enable shard_existing_coll_chunk_c... Closed
Backwards Compatibility: Minor Change
Operating System: ALL
Backport Requested:
v3.4
Steps To Reproduce:

To reproduce:

  • set up a sharded cluster (1 single node shard is enough)
  • create a collection and fill it 20000 small documents containing only an int
  • shard the collection
  • insert a few more documents through the same mongos (the number will depend on the size of each document)
    A multi-split occurs, generating a high number of small chunks.

Attached is a javascript file with these steps.

Sprint: Sharding 2017-01-02, Sharding 2017-07-31
Participants:
Case:

 Description   

Creating a small collection (about < 4M) and then sharding the collection triggers a multi-split generating a large number of chunks (possibly thousands).

Some examples:
For 20000 initial documents containing only an ObjectId and an int, inserting 7 additional similar documents will trigger a split into 1819 chunks.
For 90000 initial documents containing only an ObjectId and an int, inserting 7 additional similar documents will trigger a split into 8183 chunks.
A hard limit is reached at 8192 and with 100000 + 7 documents, no split occurs.

The behavior was observed with the following variations:

  • enabling/disabling the balancer
  • monotonically increasing shard key / hashed shard key
  • 1 or 2 shards

The behavior did not occur on 2.6.11: only a few chunks are created (3 in the case of 20007 documents described before).



 Comments   
Comment by Githook User [ 25/Aug/17 ]

Author:

{'username': 'devkev', 'email': 'kevin.pulo@mongodb.com', 'name': 'Kevin Pulo'}

Message: SERVER-20392 remove early chunksize autosplit heuristic

Plus some additional 3.4-specific jstest fixes.

(cherry picked from commit ad6a668da49c61a4276749aef7529088dc3524ea)
Branch: v3.4
https://github.com/mongodb/mongo/commit/e1f5f40fc17f99fc06dda4621564db7e31be1132

Comment by Githook User [ 09/Aug/17 ]

Author:

{'username': 'devkev', 'email': 'kevin.pulo@mongodb.com', 'name': 'Kevin Pulo'}

Message: SERVER-20392 remove early chunksize autosplit heuristic
Branch: master
https://github.com/mongodb/mongo/commit/9f2e54384d90ab0c533c6d311337d6bcc8fb5679

Comment by Pedro Rocha Goncalves [ 15/Mar/17 ]

I believe I was affected by this as well:

mongos> db.dnsServerFullAgTs.getShardDistribution()
 
Shard rs1 at rs1/msc1-1.stg:27017,msc1-2.stg:27017
 data : 10.45MiB docs : 17114 chunks : 299
 estimated data per chunk : 35KiB
 estimated docs per chunk : 57
 
Shard rs2 at rs2/msc2-1.stg:27017,msc2-2.stg:27017
 data : 9.17MiB docs : 12874 chunks : 300
 estimated data per chunk : 31KiB
 estimated docs per chunk : 42
 
Totals
 data : 19.62MiB docs : 29988 chunks : 599
 Shard rs1 contains 53.24% data, 57.06% docs in cluster, avg obj size on shard : 640B
 Shard rs2 contains 46.75% data, 42.93% docs in cluster, avg obj size on shard : 747B

This seems like an unusually large number of chunks for such a small collection. The number of chunks was actually close to 1k, but we merged empty chunks. Unfortunately our script doesn't seem to find more empty chunks to merge. We're running MongoDB 3.2.

Comment by Hoyt Ren [ 16/Mar/16 ]

It seems I meet the same problem, by the way, why the index is so large, the indexed field is just a integer. Tell me if need more detailed info.

db.pop_store.stats()
{
"sharded" : true,
"capped" : false,
"ns" : "business_message.pop_store",
"count" : 3286,
"size" : 908238,
"storageSize" : 548864,
"totalIndexSize" : 507904,
"indexSizes" :

{ "_id_" : 139264, "sk_store_id" : 208896, "store_id_1" : 159744 }

,
"avgObjSize" : 276.39622641509436,
"nindexes" : 3,
"nchunks" : 1629,

Generated at Thu Feb 08 03:54:04 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.