[SERVER-30391] Add QoS features to MongoDB Created: 28/Jul/17  Updated: 06/Dec/22  Resolved: 08/Dec/21

Status: Closed
Project: Core Server
Component/s: Usability
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Karolin Varner Assignee: Backlog - Service Architecture
Resolution: Won't Do Votes: 1
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Assigned Teams:
Service Arch
Participants:

 Description   

I am currently writing an application that generates sustained, heavy writes for about one hour at >200Mbit/s. This application is regularly causing performance issues on the mongodb side.

It currently runs relatively fine on our atlas production setup using write-concern: majority and a single insertation thread, but it is causing very high loads (>90% disk utilization) on our test setup. This is in would be desired (maxing out the server's resources is good because we get more performance), but we had a db crash in the past (https://jira.mongodb.org/browse/MMSSUPPORT-14543 caused probably by a performance difference between the primary and replication – https://jira.mongodb.org/browse/SERVER-24242) and we are still not sure this could not happen again.

In addition, using write-concern: majority causes a significant latency wich seems to be slowing down our application by a lot (now taking 3h instead of 1h since I activated this).

It would be very useful if there was a way to explicitly mark operations as high-impact/best-effort/background jobs. Mongodb should make sure that all operations that are not marked in such a way are given priority before the high-impact tasks. Mongodb should also make sure that high-impact operations can never overtax the server; these operations should either be rejected or blocked until enough resources are available to process them.

(Blocking would be very useful because then on my side I could write a scheduler that reduces the number of insertation threads or throttles them when many jobs are rejected).

This way it would be possible to operate high-load jobs at maximum efficiency without stealing resources from routine operations.



 Comments   
Comment by Karolin Varner [ 08/Aug/17 ]

Yup, that's why I filed the ticket.

Thank you for formatting the ticket well

Comment by Ramon Fernandez Marina [ 07/Aug/17 ]

Thanks for the additional details karo. Unfortunately MongoDB does not yet have some of the features you mention, so I'm going to update the summary of this ticket to reflect the larger request of QoS / admission control / etc. features and put it on our Backlog for future consideration.

Thanks,
Ramón.

Comment by Karolin Varner [ 01/Aug/17 ]

Hi Ramon,
mostly yes. Although limiting the number of ops for any process was not the intention of this ticket, quite the opposite; I meant more using the maximum number of available ops without degrading responsiveness for other operations.
Execution priorities sound right, except I would suggest adding a parameter to the write concern to let operations opt into a lower priority from the client without creating a new user on the server.

Also I am not sure if just introducing execution priorities would cut it; a very important point in this ticket for me is that it should never-ever be possible to degrade mongodb performance or the performance of other ops by hitting mongodb with a high load operation; so care would have to be taken not to make sure not to over-commit operations to mongodb; e.g. it should not be possible to increase replication lag significantly with a low-performance operation and that would mean that mongodb needs to keep an eye on the replication lag and start rejecting low-prio operations when it grows; I am not sure what the status on those kinds of QoS features is so maybe the ticket would be enough

Comment by Ramon Fernandez Marina [ 01/Aug/17 ]

karo, if I understand correctly, the functionality you're requesting is a subset of what could be accomplished with SERVER-15072, specifically if the following was implemented:

  • Limit the number/rate of operations that can be executed.
  • Define different execution priorities for different users.

I'm therefore inclined to close this ticket as a duplicate, and recommend you expand use cases in SERVER-15072, vote for it and watch it for updates.

Regards,
Ramón.

Comment by Karolin Varner [ 28/Jul/17 ]

By significant latency I mean that we need to wait for some secondaries to actually accept the operation; this takes time in which primary and secondaries are not optimally used. It would be better if we could actually send data at the rate of the slowest part in the write pipeline.

The application causing this load is a multithreaded C++ application and runs on a server with 8cores/16threads.

Generated at Thu Feb 08 04:23:39 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.