[SERVER-36370] Chunks get marked as "jumbo" when they are greater then 256mb even when chunksize is configured to be 1024mb Created: 31/Jul/18  Updated: 23/Sep/18  Resolved: 28/Aug/18

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Eric Herbrandson Assignee: Nick Brewer
Resolution: Incomplete Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Operating System: ALL
Participants:

 Description   

I originally created this as a question at SERVER-36059, but that ticket was closed and after further investigation this seems to be a bug.

Steps to recreate

  1. Create a non sharded db where some of the keys that will eventual become "chunks" are around 512mb
  2. Configure the db to use a chunk size of 1024mb
  3. Start the sharding process
  4. See that the large chunks will not be moved and will be flagged as "jumbo"


 Comments   
Comment by Nick Brewer [ 15/Aug/18 ]

herbrandson Per the documentation, a 2dsphere index can't be used a as a shard key.

I'll close this ticket for now. If you run into the chunksize problems you were seeing previously, you can comment here with the relevant outputs and I'll reopen it.

Thanks,
Nick

Comment by Eric Herbrandson [ 15/Aug/18 ]

Hey @Nick,

Sorry for the delayed response. I just got back from vacation. Unfortunately, I had to tear down the test environment where I was running this test before I left, so I'm not able to grab that information.

However, I now have another issue

I'm trying sharding this collection again by adding an additional field to the cluster key. I have the following index on the collection already...

{ 
    "feedId" : 1, 
    "timekey" : 1, 
    "entity.samplingRate" : 1, 
    "endTimekey" : 1, 
    "geo" : "2dsphere"
}

and I'm running this command to shard the collection

sh.shardCollection('liveearth.entityEvents', {feedId : 1, timekey : 1, 'entity.samplingRate': 1})

which gives me the following error

{"ok" : 0, "errmsg" : "couldn't find valid index for shard key", "code" : 96, ...}

Why doesn't this work? Is it because "geo" is a "2dsphere" index? Or is it because of the "." in "entity.samplingRate"? It is not an index on an array (i.e "entity" is a subobject, not an array). Is there something else going on here? I'm starting to feel like it's just not possible to shard this collection.

Comment by Nick Brewer [ 02/Aug/18 ]

herbrandson It is imposed to limit the amount of data that is transferred during a chunk migration, as there is currently no way to resume a failed migration. This prevents situations where several GB of data would need to be re-sent in the event of a failed migration.

There's a few more outputs I'd like to confirm so I can test this further:

  • The full output of sh.status()
  • Logs from the primary member of the config server replica set, with db.setLogLevel(5,"sharding"), specifically to find more details on any instance where a chunk failed to split.

If you'd prefer, you can upload this information to our secure portal - information shared there is only available to MongoDB employees, and is automatically removed after a period of time.

Thanks,
Nick

Comment by Eric Herbrandson [ 01/Aug/18 ]

For the key I listed in the linked ticket...

{        
     "feedId" : BinData(4, "AuGXGo/dTwGqmryKubvicA=="),
     "timekey" : ISODate("2018-04-04T21:00:00.000+0000")
}

...the count is 102,262.

The avgObjSize is 4407. If I understand the link you provided correctly, (1024 * 1024) / 4407 * 1.3 = 309.3. That matches up with what I'm seeing.

What's the reason for this limitation? Is there a way to override the 1.3 value?

 

 

Comment by Nick Brewer [ 31/Jul/18 ]

herbrandson Thanks for your report. I'd like to see a count query on the key for one of the chunks - assuming from your last ticket that the jumbo chunks have a single key. I'd also like to see the output of db.collection.stats, specifically to determine the avgObjectSize.

I ask because I suspect that the chunks may be exceeding the maximum number of chunks to migrate, which is triggering a split. As detailed in SERVER-21931, if a split fails for any reason it will result in the chunk being marked as jumbo.

Thanks,
Nick

Generated at Thu Feb 08 04:42:54 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.