[SERVER-31947] Performance degradation with more clients Created: 13/Nov/17  Updated: 27/Oct/23  Resolved: 25/Oct/18

Status: Closed
Project: Core Server
Component/s: Concurrency, WiredTiger
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Dmitry Dolgov Assignee: Kelsey Schubert
Resolution: Gone away Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File cpu_migrations.png     PNG File select_btree_throughput.png     PNG File throughput_mongodb.png    
Operating System: ALL
Steps To Reproduce:

Run YCSB/WorkloadC on MongoDB 3.4/3.2 (m4.xlarge).

Participants:

 Description   

While doing some benchmarks, I found out strange performance degradation for MongoDB for read only workload on YCSB (WorkloadC). You can see it on the right side of the graph where the number of clients is getting bigger:

I tried to investigate using perf tool and figured out that the only two metrics that are growing significantly are number of cpu migrations and syscall `sched_yield`. On the next graph you can see the dynamics of cpu migration event, number of `sched_yield` also grew 3 times bigger.

Judging from source code I can assume that it's somehow related to spin locks, since the only place where I found `sched_yield` is `spin_lock.cpp`.



 Comments   
Comment by Kelsey Schubert [ 25/Oct/18 ]

Hi erthalion,

I'm going to close this ticket for the time being since there have been a number of fixes that have landed that we expect to improve performance on this branch, please feel free to comment with an update after running new tests and we'll reopen the ticket.

Thank you,
Kelsey

Comment by Dmitry Dolgov [ 27/Mar/18 ]

Hi Kelsey,

Thank you for the information. Yes, back then I was testing MongoDB 3.4.4. Soon I'm going to do another round of benchmarks with new versions of all the databases, and we can compare the performance.

Comment by Kelsey Schubert [ 18/Jan/18 ]

Hi erthalion,

My understanding is that these tests were executed with MongoDB 3.4.4, is that correct? I'm curious whether you see the same behavior on a more recent version of MongoDB 3.4, which would included WT-3345. As you can see in WT-3345, we made significant improvements to the rwlocks in WiredTiger, which may affect the performance you're observing in these benchmarks.

Thank you,
Kelsey

Comment by Dmitry Dolgov [ 30/Dec/17 ]

Hi Henrik,

Thanks for your response. Yes, I'm aware about that. But at the same time I made the same kind of test for PostgreSQL and MySQL, and performance degradation was not that significant there - that's why I thought it's strange and maybe worth mentioning.

Comment by Henrik Edin [ 12/Dec/17 ]

Hi erthalion, MongoDB uses a thread-per-connection model. This means that when the amount of connections increases the amount of context switches between threads also increases. As context switches aren’t free this is unfortunately a behavior that can be expected. There are several reasons why this can be expensive: cache misses, page faults, contention on spin locks, etc.

In the public source depot, you can see experimentation on a thread pool model where different connections can execute on the same thread. It is too early to promise any results but you might find it worthwhile to follow that project.

Henrik

Comment by Ian Whalen (Inactive) [ 01/Dec/17 ]

Thanks for filing this, Dmitry! We believe that some upcoming work by the Platforms team on rate-limiting AsyncIO will improve the behavior you're seeing towards the right side of the graph.

Comment by Henrik Ingo (Inactive) [ 14/Nov/17 ]

Just adding a note that I met Dmitry at Highload++ and asked him to file this. (Thanks Dmitry!) I don't know more about this than is also described here, but his observation that a spinlock would be involved caught my interest.

Also note that our current performance testing is on much more powerful instances than this, so we wouldn't experience these conditions (at least not with the same YCSB setup).

Comment by Kelsey Schubert [ 13/Nov/17 ]

Hi erthalion,

Thank you for the report; I've assigned this issue to the Storage Team for evaluation.

Kind regards,
Kelsey

Generated at Thu Feb 08 04:28:41 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.