Loading...

XML

Word

Printable

JSON

Type: Improvement
Resolution: Done
Priority: Major - P3
Fix Version/s: None
Affects Version/s: 2.6.3
Component/s: Sharding
Labels:
None

CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

In my use case I've multiple databases with the same "schemas" and type of data. I've noticed that chunk migration becomes slower and slower, in correlation with the collection size.

For small databases/collections, migrating a chunk is generally done in less than 20 seconds while for my bigger collections it takes 1800 seconds in average (sometimes more than 1 hour), with all nuances between them (I've about 35 identical databases, with all sizes). Chunks have roughly the same size and number of documents in all cases, with exactly the same indexes.

Updates/Inserts are happening, but at a slow pace (I'd say less than 10 updates/inserts per hour are happening on the chunk being migrated).
My chunks are 256MB and each document have an average size of 2810 bytes (about 50,000 documents per chunk / 140MB as it seems chunks aren't "full"). The cluster doesn't receive a lot of writes (globally about 30 updates and 5 inserts per second) and I transferred as many reads as possible to secondaries. Almost 0 deletes are happening, cluster wide

All disks are regular SATA (because of dataset size).

Exemple of a low migration:
"step 1 of 6" : 119,
"step 2 of 6" : 3266,
"step 3 of 6" : 1618,
"step 4 of 6" : 2597284,
"step 5 of 6" : 2733,
"step 6 of 6" : 0

Data do not fit in RAM (but indexes does).
When I look at the logs of the "sender", I can see that "cloned"/"clonedBytes" are increasing very slowly and pauses every 16MB or so for few seconds.

iotop tells me that both the sender and the recipient are performing a lot of writes (both stuck at 100%). Magnitudes more than what is being transmitted.

The sender *
It's a basic 16GB of RAM / soft RAID 1 SATA disks server
On the sender I'd expect high reads/low writes (as the range deleter removes the previously transmitted chunks). Due to data locality I'd probably expect reads to be slower in big collections, but definitively don't expect that amount of writes.
Typical "atop" output:
DSK | sda | busy 100% | read 130 | write 2635 | MBr/s 0.13 | MBw/s 2.07 | avio 3.62 ms |
DSK | sdb | busy 81% | read 83 | write 2613 | MBr/s 0.09 | MBw/s 2.04 | avio 3.00 ms |

The recipient *
96 GB of RAM / hard RAID 1 SATA disks
I'm moving all my data to this new server (I'll end with a cluster with a single shard... but this server have 2 times more RAM than the previously combine 3 shards - 3x16=48GB vs 96 GB)
On the recipient I'd expect writes in correlation with the chunk data being migrated. This server was synced from its replicaset about 1 week ago, so it's very clean in data locality, no holes in files (it wasn't "bootstraped").

You can probably find more insights in my MMS account: https://mms.mongodb.com/host/cluster/51a2dc5c7fe227e9f188c509/52bb9a10e4b0256ace50e0d3

Have a look to the log extract for a typical overview of chunk migration speed.

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

log extract.log
12 kB
Jul 27 2014 08:54:45 PM UTC

Assignee:: Ramon Fernandez
Reporter:: Vincent
Participants:: Ramon Fernandez, Vincent
Votes:: 1 Vote for this issue
Watchers:: 5 Start watching this issue

Created:: Jul 27 2014 08:54:45 PM UTC
Updated:: Dec 10 2014 11:19:29 PM UTC
Resolved:: Aug 19 2014 11:06:59 PM UTC

Details

Description

Attachments

Attachments

Activity

People

Dates