Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Duplicate
Priority: Major - P3
Fix Version/s: None
Affects Version/s: None
Component/s: None
Labels:
- sharding-nyc-subteam3

Assigned Teams:

Sharding NYC
Operating System:
ALL
Steps To Reproduce:
Hide

Start up the shard setup in DSI using this binary

scp the mongo_updateone locust workload to the workload client and set it up

Or use any workload with concurrent two phase commit transactions

Disable the custom "canUseSingleWriteCommit" server parameter on each mongos to force the non-targeted single writes from the workload to use two phase commit

Disable all new server parameters from the binary to test baseline 2PC performance or enable any to see performance without the bottleneck (by default txnMajorityWaitInReplCoordinator is enabled which uses the async awaitReplication fix)
Show
Start up the shard setup in DSI using this binary scp the mongo_updateone locust workload to the workload client and set it up Or use any workload with concurrent two phase commit transactions Disable the custom "canUseSingleWriteCommit" server parameter on each mongos to force the non-targeted single writes from the workload to use two phase commit Disable all new server parameters from the binary to test baseline 2PC performance or enable any to see performance without the bottleneck (by default txnMajorityWaitInReplCoordinator is enabled which uses the async awaitReplication fix)
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

While performance testing for ~~SERVER-79056~~, I noticed throughput for transactions that use two phase commit scales poorly with more concurrent transactions, despite CPU and IO utilization staying low and secondaries keeping up. The problem seems to be the WaitForMajorityService used by two phase commit coordinators to wait for the participant list and decision writes to majority replicate can't keep up with many requests to wait for majority.

When I switch transaction coordinators to either wait for majority write concern as part of the writes themselves (which synchronously blocks a task executor thread) or wait asynchronously using ReplicationCoordinator::awaitReplicationAsyncNoWTimeout, throughput with the same workload goes up significantly (over 4x with my setup) and CPU becomes the bottleneck. I initially saw this in the shard DSI workload with custom 0.3ms network delay, which uses 3 node replica sets, but I reproduced it in a modified shard workload with single node replica sets.

The problem with the WaitForMajorityService seems to be that it waits for only the lowest opTime it's been given in each loop of _periodicallyWaitForMajority(), so if it receives new opTimes faster than it can wait for them, requests queue up and latency increases significantly. I modified the service to get the latest committed snapshot opTime after waiting for majority and pretend that was the most recently waited for time if it is greater than the actually waited on time (using ReplicationCoordinator::getCurrentCommittedSnapshotOpTime), and that seemed to resolve the bottleneck as well.

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

2PC_locust_results.txt
7 kB
Aug 07 2023 02:52:02 PM UTC
mongo_updateone.tar.gz
11 kB
Aug 07 2023 02:21:00 PM UTC

duplicates

SERVER-79881 Integrate WaitForMajorityService with ReplicationCoordinator

Open

is related to

SERVER-79881 Integrate WaitForMajorityService with ReplicationCoordinator

Open

Assignee:: [DO NOT USE] Backlog - Sharding NYC
Reporter:: Jack Mulrow
Participants:: [DO NOT USE] Backlog - Sharding NYC, Jack Mulrow
Votes:: 0 Vote for this issue
Watchers:: 7 Start watching this issue

Created:: Aug 07 2023 02:15:50 PM UTC
Updated:: Aug 15 2023 08:13:17 PM UTC
Resolved:: Aug 15 2023 02:27:20 PM UTC

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates