[SERVER-41048] Address regression in transaction performance with storage stats collection Created: 08/May/19  Updated: 06/Dec/22  Resolved: 18/Aug/21

Status: Closed
Project: Core Server
Component/s: Storage
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Sulabh Mahajan Assignee: Backlog - Storage Execution Team
Resolution: Won't Do Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File mongod.conf     File restart_mongo.sh     File runtest.py    
Issue Links:
Related
is related to WT-4813 Enable cursor caching for statistics ... Backlog
Assigned Teams:
Storage Execution
Participants:
Story Points: 3

 Description   

There is appx 15% regression in transaction performance due to the collection of storage statistics after each individual operation under the transaction. This ticket tracks work the work required to resolve this issue.



 Comments   
Comment by Geert Bosch [ 18/Aug/21 ]

Closing this, as we'd address this in WT-4813 if at all.

Comment by Alexander Gorrod [ 03/May/21 ]

geert.bosch the Storage Engines knowledge about this regression has timed out now, I think. I'm going to assign this to the execution backlog, and let you triage it again.

Gathering statistics has an overhead - so I'm inclined to say this is an expected consequence of adding more functionality for users but didn't want to make that choice for you.

Comment by Sulabh Mahajan [ 23/May/19 ]

Attached test files. The way I was running test:

1. setup 1-node or 3-node replset using mongod.conf attached. restart_mongo.sh will restart mongod(s) and put them in a replset
2. run the python script to execute test 10 times and get an average of those runtimes.

Comment by Sulabh Mahajan [ 23/May/19 ]

I reran test several times with different portions of the code disabled.
I confirm that most of the regression comes from the opening and closing of the statistics cursor itself.

The linked ticket is to enable cursor caching for statistics cursor. This should help mitigate tracked regression.

Comment by Alexander Gorrod [ 22/May/19 ]

Thanks for the results and write up Sulabh. brian.lane could we discuss how to deal with the performance overhead here?

Comment by Sulabh Mahajan [ 22/May/19 ]

I ran tests again, N updates per transaction, 10 transactions per thread and 5 threads in total.
(Run times in milliseconds averaged over 10 runs):

replset-config read/write concern N runtime-with-stats runtime-without-stats regression %
1-node default 3000 ~18000 ~16000 12.5
1-node default 300 1960 1710 14.7
1-node snapshot,majority 300 1986 1900 4.3
3-node default 3000 18980 17711 7.1
3-node snapshot,majority 3000 19222 18182 5.7
3-node default 300 2003 1908 5
3-node snapshot,majority 300 2068 2019 2.4

I see in your script that you configure --setParameter wiredTigerCursorCacheSize=0 - does that make a difference to the performance difference?

Sorry for the confusion, it is commented out, remnants of what I based my script on.

Comment by Alexander Gorrod [ 08/May/19 ]

sulabh.mahajan I'd like to understand more about this performance cost:

  • Is there a regression if running in a replica set configuration?
  • What is the scale of the regression if the transaction does less updates? i.e: if you do 300 updates is there still a noticeable change in performance?
  • I see in your script that you configure --setParameter wiredTigerCursorCacheSize=0 - does that make a difference to the performance difference?
Comment by Sulabh Mahajan [ 08/May/19 ]

Attached script updates 3000 documents in each transaction, 10 transactions per thread, 5 threads simultaneously and reports the time taken.
I am roughly getting the following runtime:

Without stats collection: 16 seconds
With stats collection: 18 seconds

That amounts to a regression of 12.5%.

I further investigated and looks like most of the regression comes from opening a session:statistics cursor and iterating through its contents.

Generated at Thu Feb 08 04:56:40 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.