Loading...

XML

Word

Printable

JSON

Operating System:
ALL
Steps To Reproduce:

Hide

1) create collection sharded on {_id: 1}

2) turn off balancer

2) insert ~100M small docs like {_id: "text", "c": 12345} (this is a colleciton of counts of strings if you care to know the real world use case)

3) turn on balancer and wait til things stop moving

4) turn balancer off

5) manually find all jumbo chunks and run sh.splitFind() on them

6) go back to 3 forever (or at least it feels like it)

Show
1) create collection sharded on {_id: 1} 2) turn off balancer 2) insert ~100M small docs like {_id: "text", "c": 12345} (this is a colleciton of counts of strings if you care to know the real world use case) 3) turn on balancer and wait til things stop moving 4) turn balancer off 5) manually find all jumbo chunks and run sh.splitFind() on them 6) go back to 3 forever (or at least it feels like it)
Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

is it possible there was a regression re-surfaced any of these issues?

i'm seeing very very similar behavior:

mongodb 3.0.4 sharded collections with 64MB chunks using wiredtiger

one colleciton with documents that average 2kB in size
one collection with documents that average 40B bytes in size

the collection with 2kB size docs is even distributed
the collection with 40B size docs is nearly entirely jumbo chunks

running the balancer does not seem to automatically split chunks - just marks them as jumbo.

i can run pass after pass of sh.splitFind on each chunk until there are no jumbo chunks left and then more things get balanced.

except then when i run the balancer again i get more chunks marked as jumbo and then i need to do splits again.

basically to get the cluster evenly distributed after an initial load i have to alternate splitting and balancing for days.

duplicates

SERVER-19919 Chunks that exceed 250000 docs but are under half chunk size get marked as jumbo