[SERVER-59776] 50% regression in single multi-update Created: 03/Sep/21  Updated: 25/Aug/22  Resolved: 13/Jun/22

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 4.4.8, 5.0.2
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Bruce Lucas (Inactive) Assignee: Matthew Russotto
Resolution: Done Votes: 0
Labels: perf-escapes
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: Text File contention_44_exhaust.txt    
Issue Links:
Related
related to SERVER-55606 Majority writes with j: false perform... Open
related to SERVER-62193 Improve insert_vector secondary perfo... Open
related to SERVER-65054 Avoid slow insert batches blocking re... Open
related to SERVER-65725 Mutex stats counters can use relaxed ... Backlog
related to SERVER-65657 Reduce calls to the cappedInsertNotif... Closed
related to SERVER-65671 Use _termShadow in updateTerm and pro... Closed
related to SERVER-65938 Collection haveCappedWaiters should c... Closed
related to SERVER-66023 Do not constantly reset election and ... Closed
related to SERVER-54939 Investigate secondary batching behavi... Closed
is related to SERVER-48522 Regression after mass deletion Open
is related to SERVER-31694 17% throughput regression in insert w... Closed
is related to SERVER-53667 High rate of journal flushes on secon... Closed
is related to SERVER-57407 Avoid taking ReplicationCoordinator m... Closed
is related to SERVER-66809 Move BSON serialization out of the lo... Closed
is related to SERVER-66810 Pull expensive calls like _wakeReadyW... Closed
Operating System: ALL
Sprint: Repl 2021-09-20, Replication 2021-12-13, Replication 2021-12-27, Repl 2022-03-21, Repl 2022-04-04, Repl 2022-04-18, Repl 2022-05-02, Repl 2022-05-16, Repl 2022-05-30, Repl 2022-06-13, Repl 2022-06-27
Participants:
Case:

 Description   

Insert 1M documents like so into a collection:

{x: 'x'.repeat(100), modified: false}

Then update like so:

db.c.updateMany({}, {$set: {modified: true}}, {writeConcern: {w: "majority"}})

Timings are as follows:

4.2.15: 56s
4.4.8:  82s
5.0.5:  86s

PMP profiling shows this seems to be accounted for by contention on the replication coordinator mutex in setMyLastAppliedOpTimeAndWallTimeForward.



 Comments   
Comment by Matthew Russotto [ 13/Jun/22 ]

SERVER-66023, SERVER-65657, SERVER-65671, SERVER-66809, and SERVER-66810 were implemented as a result of this investigation and have significantly improved multi-update performance, mostly by eliminated or reducing expensive operations with the replication coordinator mutex held.

Comment by Bruce Lucas (Inactive) [ 03/Sep/21 ]

I found SERVER-57407 aimed at a specific cause of replication coordinator contention but I couldn't find a ticket aimed at the more general issue so I opened this ticket.

Generated at Thu Feb 08 05:48:06 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.