[SERVER-69820] Flow control bypass should follow admission priority Created: 20/Sep/22  Updated: 29/Oct/23  Resolved: 27/Oct/22

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 6.2.0-rc0

Type: Improvement Priority: Major - P3
Reporter: Louis Williams Assignee: Jordi Olivares Provencio
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Backwards Compatibility: Fully Compatible
Sprint: Execution Team 2022-10-03, Execution Team 2022-10-17, Execution Team 2022-10-31
Participants:

 Description   

With the addition of admission priority for storage engine ticket acquisition, we should have flow control also use this priority to inform whether or not operations take flow control tickets as well.

This would replace explict calls to setShouldParticipateInFlowControl().

In making this change, we will need to evaluate the following:

  • Should all operations that bypass flow control also be bypassing ticket acquisition using "kImmediate" admission priority?
  • Should all operations that bypass ticket acquisition should also bypass flow control?

If both of these conditions are universally true, then it would simplify our admission system and bring these two mechanisms closer together in behavior.



 Comments   
Comment by Githook User [ 27/Oct/22 ]

Author:

{'name': 'Jordi Olivares Provencio', 'email': 'jordi.olivares-provencio@mongodb.com', 'username': 'jordiolivares'}

Message: SERVER-69820 Simplify Flow control ticket acquisition
Branch: master
https://github.com/mongodb/mongo/commit/e79fa3438f0fe33a4e688e29fad0a161a8681523

Comment by Githook User [ 27/Oct/22 ]

Author:

{'name': 'Jordi Olivares Provencio', 'email': 'jordi.olivares-provencio@mongodb.com', 'username': 'jordiolivares'}

Message: SERVER-69820 Simplify Flow control ticket acquisition
Branch: master
https://github.com/10gen/mongo-enterprise-modules/commit/11d238f16c78ffc88f15116a6b35cb46ecea4521

Comment by Jordi Olivares Provencio [ 10/Oct/22 ]

louis.williams@mongodb.com Could an alternative be to use the NamespaceString::isReplicated() method to discern whether to skip ticket acquisition?

That would avoid complicating the GlobalLock API and apply this behaviour to all collections that don't participate in replication.

Comment by Louis Williams [ 07/Oct/22 ]

Thanks for investigating, jordi.olivares-provencio@mongodb.com!

A few ideas about the places where we disable flow control but don't opt-out of tickets.

  • I think we should consider the OplogCapMaintainerThread an important operation. It keeps a node's oplog from expanding out of control. Unlike a TTL collection, the oplog stores writes for everything in the system and must be kept in check to the best ability. It also uses WT fast truncation, so it's less impactful on the system than TTL deletes because it doesn't have to page as much data in from disk.
  • The JournalFlusher does already skip ticket acquisition, just at a much lower layer. Let's just push that exemption up without changing any behavior
  • For importCollection dryRun I'm not sure exactly why it needs to opt out of flow control. I don't see any reason why it's required. It runs async and can't block replication.

For the profile command and profiling itself, I can imagine an API like this:

  • Update the GlobalLock API to accept an option to opt-out of flow control tickets
  • Use this new API for the profile command and profiling
  • At this point, we should be able to get rid of the setShouldParticipateInFlowControl() option.
Comment by Jordi Olivares Provencio [ 05/Oct/22 ]

The following are a list of location where we disable Flow Control:

  • OplogCapMaintainerThread
  • ProfileCmdBase
  • JournalFlusher
  • Profiling
  • OplogApplier
  • NoopWriter
  • ReplSet votes

Of these, the last 3 items are crucial to normal functioning of the database and I would mark as immediate. To be safe JournalFlusher should be marked as immediate too for data consistency.

The other 3 items are up for debate whether or not they should be considered immediate. OplogCapMaintainer is more of a background maintenance task like the TTL deleter and profiling doesn't participate in replication but is not essential to have immediate priority.

The inverse of this search (immediate priority requires no flow control) seems to be true in all cases.

Generated at Thu Feb 08 06:14:29 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.