[SERVER-81101] Speed up TransactionOperations::addTransactionOperation Created: 15/Sep/23  Updated: 10/Nov/23  Resolved: 10/Nov/23

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 7.3.0-rc0

Type: Improvement Priority: Major - P3
Reporter: Matthew Russotto Assignee: Matthew Russotto
Resolution: Fixed Votes: 0
Labels: auto-reverted, perf-tiger, perf-tiger-handoff
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Problem/Incident
Assigned Teams:
Replication
Backwards Compatibility: Fully Compatible
Sprint: Repl 2023-11-13
Participants:
Linked BF Score: 0

 Description   

In TransactionOperations, we ensure we're not duplicating statement IDs by

1) Copying the unordered set containing the current set of IDs

2) Adding the IDs to that set and throwing an error if any are already there

3) Moving the new set to the old set.

This is expensive; on my workstation, about 800ns for a single insert (starting from an empty ID set), 450us for 512 inserts, and 300ms for 262,144 inserts. If we require that statement IDs increase monotonically, we can replace this with a very simple check and basically save it all. But we can also check and insert the statement IDs to one set and just remove the ones we added in the exception case.

Also if we don't remove it we should use an absl::btree_set instead of an unordered_set.



 Comments   
Comment by Githook User [ 09/Nov/23 ]

Author:

{'name': 'Matthew T. Russotto', 'email': 'matthew.russotto@mongodb.com', 'username': 'mtrussotto'}

Message: SERVER-81101 Speed up TransactionOperations::addTransactionOperation statement ID check
Branch: master
https://github.com/mongodb/mongo/commit/c42fe8bb64dad8e0acae9d20f233a807169bc197

Comment by Matthew Russotto [ 09/Nov/23 ]

This was reverted because the std::bidirectional_iterator concept isn't supported in the XCode version we're using. Since that's just being used for a static_assert, I'll just remove the static_assert.

Comment by Githook User [ 09/Nov/23 ]

Author:

{'name': 'auto-revert-processor', 'email': 'dev-prod-dag@mongodb.com', 'username': ''}

Message: Revert "SERVER-81101 Speed up TransactionOperations::addTransactionOperation statement ID check"

This reverts commit 87f308646c7d1c6bf96b5086e96e6e39ea4665be.
Branch: master
https://github.com/mongodb/mongo/commit/a690fc707582dc9ab124f937855074c04f2f2191

Comment by Githook User [ 08/Nov/23 ]

Author:

{'name': 'Matthew Russotto', 'email': 'matthew.russotto@mongodb.com', 'username': 'mtrussotto'}

Message: SERVER-81101 Speed up TransactionOperations::addTransactionOperation statement ID check
Branch: master
https://github.com/mongodb/mongo/commit/87f308646c7d1c6bf96b5086e96e6e39ea4665be

Comment by Matthew Russotto [ 02/Nov/23 ]

Using a structure designed to efficiently handle ranges.

Running build/install/bin/transaction_operations_bm
Run on (8 X 243.75 MHz CPU s)
CPU Caches:
  L1 Data 64 KiB (x8)
  L1 Instruction 64 KiB (x8)
  L2 Unified 1024 KiB (x8)
  L3 Unified 32768 KiB (x1)
Load Average: 1.96, 8.50, 9.79
----------------------------------------------------------------------
Benchmark                            Time             CPU   Iterations
----------------------------------------------------------------------
BM_AddOperations/1/0               552 ns          553 ns      1246421
BM_AddOperations/10/0             1982 ns         1986 ns       352293
BM_AddOperations/100/0           14118 ns        14167 ns        48793
BM_AddOperations/1000/0         136909 ns       137136 ns         5213
BM_AddOperations/10000/0       1590656 ns      1591859 ns          438

So the question is whether the additional code is worth a couple of microseconds per 100 operations. I think yes. One, small operations are almost all overhead like this, so those couple of microseconds could be significant. Two, I think keeping track of all the statement IDs used or executed by an operation in order might be something we need for bulk writing vectored inserts, and an augmented version of this structure could help there.

Comment by Matthew Russotto [ 02/Nov/23 ]

Using absl_flat_hash_set

Run on (8 X 243.75 MHz CPU s)
CPU Caches:
  L1 Data 64 KiB (x8)
  L1 Instruction 64 KiB (x8)
  L2 Unified 1024 KiB (x8)
  L3 Unified 32768 KiB (x1)
Load Average: 3.73, 13.30, 11.32
----------------------------------------------------------------------
Benchmark                            Time             CPU   Iterations
----------------------------------------------------------------------
BM_AddOperations/1/0               573 ns          574 ns      1224300
BM_AddOperations/10/0             2181 ns         2189 ns       319896
BM_AddOperations/100/0           16233 ns        16282 ns        42528
BM_AddOperations/1000/0         160662 ns       160898 ns         4333
BM_AddOperations/10000/0       1834791 ns      1835047 ns          372

Comment by Matthew Russotto [ 31/Oct/23 ]

Using an absl_set. Livable, though it's still kinda annoying that just checking statement IDs is so costly.

Running build/install/bin/transaction_operations_bm
Run on (8 X 243.75 MHz CPU s)
CPU Caches:
  L1 Data 64 KiB (x8)
  L1 Instruction 64 KiB (x8)
  L2 Unified 1024 KiB (x8)
  L3 Unified 32768 KiB (x1)
Load Average: 0.12, 0.31, 0.63
-----------------------------------------------------------------
Benchmark                       Time             CPU   Iterations
-----------------------------------------------------------------
BM_AddOperations/1            561 ns          563 ns      1246070
BM_AddOperations/10          2121 ns         2128 ns       329242
BM_AddOperations/100        17633 ns        17684 ns        39135
BM_AddOperations/1000      186612 ns       186891 ns         3786
BM_AddOperations/10000    2305109 ns      2306254 ns          293

Comment by Matthew Russotto [ 31/Oct/23 ]

Not checking statement IDs at all: this is the limit for this particular optimization.

Run on (8 X 243.75 MHz CPU s)
CPU Caches:
  L1 Data 64 KiB (x8)
  L1 Instruction 64 KiB (x8)
  L2 Unified 1024 KiB (x8)
  L3 Unified 32768 KiB (x1)
Load Average: 0.23, 0.96, 1.33
-----------------------------------------------------------------
Benchmark                       Time             CPU   Iterations
-----------------------------------------------------------------
BM_AddOperations/1            538 ns          539 ns      1296184
BM_AddOperations/10          1879 ns         1887 ns       369840
BM_AddOperations/100        13321 ns        13371 ns        51770
BM_AddOperations/1000      128685 ns       128974 ns         5519
BM_AddOperations/10000    1523421 ns      1524721 ns          454

Comment by Matthew Russotto [ 31/Oct/23 ]

Baseline on master on an ARM workstation

Run on (8 X 243.75 MHz CPU s)
CPU Caches:
  L1 Data 64 KiB (x8)
  L1 Instruction 64 KiB (x8)
  L2 Unified 1024 KiB (x8)
  L3 Unified 32768 KiB (x1)
Load Average: 1.81, 1.83, 1.63
-----------------------------------------------------------------
Benchmark                       Time             CPU   Iterations
-----------------------------------------------------------------
BM_AddOperations/1            550 ns          551 ns      1271878
BM_AddOperations/10          3280 ns         3295 ns       213489
BM_AddOperations/100       109490 ns       109537 ns         6381
BM_AddOperations/1000    10680907 ns     10681125 ns           65
BM_AddOperations/10000 1207313299 ns   1207153673 ns            1

Generated at Thu Feb 08 06:45:28 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.