[SERVER-13663] Perf regression for workload that updates 1 row with 1 or many clients Created: 18/Apr/14 Updated: 13/Jun/23 Resolved: 13/Jun/23 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Performance, Write Ops |
| Affects Version/s: | 2.6.0 |
| Fix Version/s: | 4.1 Desired |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Mark Callaghan | Assignee: | Backlog - Performance Team |
| Resolution: | Won't Do | Votes: | 0 |
| Labels: | perf-stop-regressions, pperf | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||||||||||
| Issue Links: |
|
||||||||||||||||
| Assigned Teams: |
Product Performance
|
||||||||||||||||
| Operating System: | ALL | ||||||||||||||||
| Steps To Reproduce: | I used my fork of jmongosysbench - https://github.com/mdcallag/sysbench-mongodb as my changes have yet to be pushed upstream. This is the only option I set for mongod: Then I load 400M rows into 1 collection (in the blog post I used 8 collections, but the repro case here only needs 1): Then I run the test first for one client thread: And then for 32 client threads: |
||||||||||||||||
| Participants: | |||||||||||||||||
| Description |
|
My results are at http://smalldatum.blogspot.com/2014/04/biebermarks.html. Using the same hardware for the test and 2.4.9 versus 2.6 (rc2 or release) I get more updates/second with 2.4.9 versus 2.6 – 1.22X more at 32 concurrent clients and 1.11X more updates/second with 1 client. |
| Comments |
| Comment by Miguel Angel Nieto [ 13/Jun/23 ] | |||||||||||||||||||||||||||||||||||
|
ger.hartnett@mongodb.com this ticket refers to 2.x version of MongoDB, I think it won't be needed anymore. | |||||||||||||||||||||||||||||||||||
| Comment by Ramon Fernandez Marina [ 15/May/15 ] | |||||||||||||||||||||||||||||||||||
|
rui.zhang, can we re-test with a recent release (3.0.3 is the latest stable) and see if this is still an issue? Thanks, | |||||||||||||||||||||||||||||||||||
| Comment by hari.khalsa@10gen.com [ 22/Apr/14 ] | |||||||||||||||||||||||||||||||||||
|
Thanks for the ticket, mdcallag. There is a fast path for find-by-_id that is being used in 2.4.x that is not being used in 2.6.x. I believe this is responsible for some of the regression you're seeing. I've created | |||||||||||||||||||||||||||||||||||
| Comment by Mark Callaghan [ 22/Apr/14 ] | |||||||||||||||||||||||||||||||||||
|
Figured out how to make google perftools work, so here is a CPU profile from 1 client thread doing random queries over 1M row table. All data is in the OS filesystem cache – no disk reads. The client ran for 5 minutes and the average QPS is 15662 for 2.4.9 and 10286 for 2.6.0. Looks like the new overhead for 2.6 is from the optimizer and maybe the parser. | |||||||||||||||||||||||||||||||||||
| Comment by Eliot Horowitz (Inactive) [ 20/Apr/14 ] | |||||||||||||||||||||||||||||||||||
|
I'm seeing an issue with both java driver versions, but much worse with the newest. | |||||||||||||||||||||||||||||||||||
| Comment by Mark Callaghan [ 19/Apr/14 ] | |||||||||||||||||||||||||||||||||||
|
I used mongo-java-driver-2.11.4.jar for all of my tests. If it helps I will repeat with another version. | |||||||||||||||||||||||||||||||||||
| Comment by Eliot Horowitz (Inactive) [ 19/Apr/14 ] | |||||||||||||||||||||||||||||||||||
|
Seems a big part of the issue is new write commands. | |||||||||||||||||||||||||||||||||||
| Comment by Mark Callaghan [ 19/Apr/14 ] | |||||||||||||||||||||||||||||||||||
|
Will provide more details on Monday, but using same database with 8 collections and concurrent point queries I see a larger regression in 2.6. Will repeat test with 1 thread (client). At 8 threads I get 94k QPS with 2.4.9 and 62k QPS with 2.6.0 release. Per "top" the Java client uses 1.5X more CPU with 2.4.9 than with 2.6.0 as expected. The context switch rate is about 1.4X more with 2.4.9 which is as expected and suggests this isn't a mutex contention issue. But mongod for 2.4.9 does not use 1.5X more CPU, instead 2.6.0 is using 1.2X more CPU even though 2.4.9 is doing 1.5X more QPS. For this test, regression at 8 & 16 threads was worse than at 32 and 40 threads. Test host has 40 cores with hyperthread enabled and peak QPS was at 32 threads. Top-n sources of CPU per "perf" on Linux... for 2.4.9
And for 2.6.0 release
| |||||||||||||||||||||||||||||||||||
| Comment by Mark Callaghan [ 18/Apr/14 ] | |||||||||||||||||||||||||||||||||||
|
Comparing 2.4.9 with 2.6.0 release using "top" with 1 client thread
So I see a similar pattern at 1 thread as at 32 threads, but maybe the regression is worse with 32 threads. So I will focus on profiling at 1 thread first to see what uses more CPU. |