[SERVER-22562] Protocol Version 1 causes very very slow chunk migrations Created: 10/Feb/16  Updated: 10/Mar/16  Resolved: 13/Feb/16

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Yoni Douek Assignee: Unassigned
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
depends on SERVER-22233 Reduce the secondary throttling durin... Closed
Duplicate
duplicates SERVER-22276 implement "j" flag in write concern a... Closed
Operating System: ALL
Participants:

 Description   

(6th bug found in 3.2.1 this week and counting)

After upgrading our cluster to pv1 - we noticed that chunk migrations were extremely slow. 20 documents per second. We have 10TB of data and can't afford this speeds. Especially when the cluster can only move a single chunk at time (scalable architecture).

Cancelling _secondaryThrottle fixed it, so we figured out it has something to do with replication.

We reverted to pv0 and it was fixed.

To prove that it was the case, we even changed the pv in the middle of a single chunk migration - the first half (pv1) was super slow, the second was fast as always (pv0).

We're guessing it has something to do with the way you implemented majority writes in pv1, which is inefficient.

This is one of many issues with pv1, and 3.2.1 in general.

Mongodb seems unusable as a database in its current form.



 Comments   
Comment by Ramon Fernandez Marina [ 10/Mar/16 ]

Thanks for the feedback yonido, glad to hear 3.2.4 is helping with your use case.

Regards,
Ramón.

Comment by Yoni Douek [ 10/Mar/16 ]

Looks much better on 3.2.4. Thanks.

Comment by Ramon Fernandez Marina [ 13/Feb/16 ]

yonido, you're right that in PV1 it's the majority writes that cause the slowdown – because the writes are waiting for stronger safety guarantees.

There's ongoing work to address this slowdown: SERVER-22276 and SERVER-22233. In the mean time you can use PV0 if your use case does not require those stronger safety guarantees.

Regards,
Ramón.

Comment by Ramon Fernandez Marina [ 10/Feb/16 ]

Thanks for your report yonido, we'll investigate. It seems that you've found a workaround, so please continue to use pv0 for the time being until we can find the source of the slowness in pv1 chunk migrations.

Regards,
Ramón.

Generated at Thu Feb 08 04:00:45 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.