Loading...

XML

Word

Printable

JSON

Type: Improvement
Resolution: Duplicate
Priority: Major - P3
Fix Version/s: None
Affects Version/s: 3.2.11
Component/s: Sharding
Labels:
None

Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

The Problem:

For customers who have their availability zones geographically quite far away, secondaries can have an unavoidable high latency connection to the primary, even though the data-rates are well within needs. Round-trip times as measured by ping can be in the 40-100ms range.

In the above environment, chunk migration is terribly slow. Chunk transfer rates in our cluster average a measly 4k bytes/sec. Furthermore, range-deletes for removing the documents that belonged to a chunk that was just moved are also terribly slow. The rest of Mongo works well in our cluster. In fact, our secondaries' Op-time lag are typically less than 1-second. Standing up local secondary servers (i.e., close to the primary servers) improves chunk migration rates and range-delete rates by at least two orders of magnitude, but this has a tremendous cost, effectively doubling the number of secondary servers we require.

A Possible Solution:

It appears from our experience that when the balancer calls moveChunk and/or rangeDeletes, it is using secondary-throttling by default with a write concern of at least 2. Why couldn't MongoDB support a "high-latency-tolerant" Chunk Migration mode; where calls to moveChunk and rangeDeletes for chunk migration, would use a write-concern of 1 (i.e., secondary-throttling disabled) except for the final write or delete?
The final write or delete for each chunk could use secondary throttling with a write-concern of 2. I believe this would yield a huge improvement in chunk migration performance for high-latency environments. What are the down sides to this solution? I really can't think of any.

duplicates

SERVER-23340 Turn off moveChunk secondaryThrottle by default

Closed

Assignee:: Kaloian Manassiev
Reporter:: James Reitz
Participants:: Daniel Pasette, James Reitz, Kaloian Manassiev
Votes:: 0 Vote for this issue
Watchers:: 9 Start watching this issue

Created:: Apr 26 2017 06:12:01 PM UTC
Updated:: May 31 2017 09:23:23 PM UTC
Resolved:: Apr 29 2017 12:52:09 PM UTC

Details

Description

The Problem:

A Possible Solution:

Attachments

Issue Links

Forms

Activity

People

Dates