[SERVER-17194] Low Throughput for YCSB 50-50 workload with high client threads Created: 05/Feb/15  Updated: 06/Dec/22  Resolved: 21/Nov/16

Status: Closed
Project: Core Server
Component/s: Performance, Storage, WiredTiger
Affects Version/s: 3.0.0-rc7, 3.0.0-rc8, 3.0.0-rc9, 3.0.0-rc10
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: David Daly Assignee: Backlog - Storage Execution Team
Resolution: Done Votes: 1
Labels: 28qa
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File 3.0build256.png     PNG File Patch256.png     PNG File patch.png     PNG File workload50-50.png    
Issue Links:
Related
related to SERVER-18213 Lots of WriteConflict during multi-up... Closed
related to SERVER-19189 Improve performance under high number... Closed
is related to SERVER-17157 Seeing pauses in YCSB performance wor... Closed
is related to SERVER-16662 Extended pauses in WiredTiger when wo... Closed
is related to SERVER-15944 Make single-update write conflict pat... Closed
Assigned Teams:
Storage Execution
Backwards Compatibility: Fully Compatible
Operating System: ALL
Steps To Reproduce:

Updated
YCSB: https://github.com/achille/YCSB

Populate database with 2 M documents
ycsb load mongodb -s -P workloads/workloada -p workload=com.yahoo.ycsb.workloads.CoreWorkload -p mongodb.writeConcern=acknowledged -p mongodb.database=ycsb -p recordcount=2000000 -p exportmeasurementsinterval=30000 -p fieldcount=10 -p timeseries.granularity=100 -p threadcount=256 -p insertretrycount=10 -p readretrycount=1 -p ignoreinserterrors=true -p reconnectionthroughput=10 -p mongodb.url=mongodb://localhost:27017 -p fieldnameprefix=f -p maxexecutiontime=96000 -p mongodb.readPreference=primary -p fieldlength=10 -p reconnectiontime=1000 -p operationcount=200000000

Run Workload A
./bin/ycsb run mongodb -s -P workloads/workloada -p workload=com.yahoo.ycsb.workloads.CoreWorkload -p mongodb.writeConcern=acknowledged -p mongodb.database=ycsb -p recordcount=2000000 -p exportmeasurementsinterval=30000 -p fieldcount=10 -p timeseries.granularity=100 -p threadcount=256 -p insertretrycount=10 -p readretrycount=1 -p ignoreinserterrors=true -p reconnectionthroughput=10 -p mongodb.url=mongodb://localhost:27017 -p fieldnameprefix=f -p maxexecutiontime=96000 -p mongodb.readPreference=primary -p fieldlength=10 -p reconnectiontime=1000 -p operationcount=50000000

Participants:

 Description   

Running ycsb with high thread count leads to low WT throughput on the 50-50 workload (Workload A). Stand-alone mogod with wiredTiger storage engine.

Seeing with that high levels of WT Rollback.

Update:

  • Related to SERVER-16662 in that the problem requires many more threads than cores. Running with few cores has high performance. Running with many more thread than cores (256 on 12) leads to dramatic (50% or more) reduction in throughput.
  • Differs from SERVER-16662 in that the performance drop is steady (not drops to zero), and is correllated with wt transactions: transactions rolled back stat being the same level or higher than WT transaction: transactions completed.

With 3.0.0 large number of documents in not required. Issue manifests with 2M documents.

From the original description:

In the graph below, before time A there are 256 client threads. After time B there are 32 threads. Throughput is higher with 32 threads. Substantial number of rollbacks with 256 threads.

Results shown above from run with git version 7d9ec251cf0e70bc0f9bb246aacfb6e62226ad37



 Comments   
Comment by Alexander Gorrod [ 21/Nov/16 ]

Thanks david.hows a 15% degradation when overloading the number of threads to be 32x the number of cores seems reasonable to me. I'm going to close this as gone away.

Comment by Sam Kleinman (Inactive) [ 03/Jun/15 ]

eitan.klein/david.daly, do you have any updates on this issue? If not, what are our next steps here?

Thanks,
sam

Comment by David Daly [ 09/Mar/15 ]

Updated description and reprop steps for smaller repro, and to compare to related scaling ticket.

Comment by Daniel Pasette (Inactive) [ 06/Feb/15 ]

We believe this fix will make a big difference: SERVER-15944. This is in and merged back to v3.0.

Comment by David Daly [ 05/Feb/15 ]

Related to SERVER-16662 in that more threads than cores. Shows different symptoms in that there aren't major pauses, just lower throughput.

Generated at Thu Feb 08 03:43:36 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.