[SERVER-60983] Evaluate the performance of the new way of filtering writes to orphaned documents Created: 26/Oct/21  Updated: 30/Dec/21  Resolved: 30/Dec/21

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Major - P3
Reporter: Sergi Mateo Bellido Assignee: Antonio Fuschetto
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Sprint: Sharding EMEA 2021-11-01, Sharding EMEA 2021-11-15, Sharding EMEA 2021-11-29, Sharding EMEA 2021-12-13, Sharding EMEA 2021-12-27, Sharding EMEA 2022-01-10
Participants:

 Description   

The goal of this task is to evaluate the performance impact of the new way of filtering writes on orphaned documents as part of PM-2423.

The first task is to check if we have a benchmark already measuring the throughput of writes. At the end what we want to measure is the overhead introduced by this new way of filtering writes compared to the previous implementation.

1st workload: without orphaned documents
The goal of this benchmark is to measure the overhead of checking if the document is owned by the shard in a scenario in which there are no orphaned documents. We should test different scenarios:

  • Targeted writes (i.e. targeting just one shard, with a valid shard version). This scenario is very interesting because we believe that filtering via the the ShardVersion should be enough.
  • Broadcast multi writes (i.e. ChunkVersion::IGNORED()).
  • Others? broadcast write on a txn? Direct writes to a shard? My feeling is that with the previous two should be enough but I am open to evaluate other scenarios if we think it could be interesting.

2nd workload: with orphaned documents
The first workload is just evaluating the cost of checking the ownership of a document. The goal of this second workload is to evaluate also the cost of skipping a document. TBH I am still thinking on how to measure this, one idea I have is to create a sharded collection with only orphaned documents (created by direct writes to the shard!). Then all writes will be filtered out, so if we create the same number of documents on both workloads (without and with orphaned documents) the difference of the write times will be the overhead of just skipping. Open to other ideas



 Comments   
Comment by Antonio Fuschetto [ 29/Dec/21 ]

Introduction

To measure the performance degradation introduced by the new mechanism to filter our write operations on orphaned documents (i.e. SERVER-59832), it has been implemented a dedicated test starting from the existing CRUD workloads test. Unfortunately, the existing test is too generic and then unsuitable for the purpose of this task where we would like to stress operations triggering the new filtering logic 1.

The new test measures the executions time of the following use-cases using sharded collections of different cardinalities and sizes:

  • Update all documents using an empty query (i.e. {})
  • Update all documents using a non-empty query
  • Delete all documents using an empty query (i.e. {})
  • Delete all documents using a non-empty query

Update and delete operations have been re-executed using both empty and non-empty queries to evaluate the performance penalties in scenarios where we assumed that the evaluation of the query had a non negligible cost in the total execution time.

These operations were executed 5 times on 5 different collections (of the same type) obtaining the average value and taking into account the standard deviation to discard possibly distorted samples (e.g. caused by system processes running on the dedicated test machine).

The experiment was repeated using collections with 100, 1K, 10K, 100K and 1M documents, and with document sizes of 128B, 512B, 1KB, 2KB and 1MB.

Results

The obtained results highlighted that the current filtering logic on orphaned documents introduces a penalty in the execution time of about 5-6% on for update operations and 7-8% for delta operations. This value does not change significantly on varying the number of documents in the collection and their size.

Use case Performance degradation
Update with empty query +6.0%
Update with non-empty query +5.2%
Delete with empty query +7.9%
Delete with non-empty query +6.9%

Further experiments also showed that having a huge number of chunks (e.g. 100K) affects performance but, as the Sharding team is actively working to avoid this type of scenarios (i.e. PM-2321), the analysis did not focus on that.

Detailed information on different test cases and results is available in SERVER-59832 - Performance tests.

Conclusion

The current implementation to filter out write operations on orphaned documents (i.e. SERVER-59832) introduces an 8% overhead on any update and delete operations (rounding up).

In order to minimize the computational cost of this logic, several areas for improvement have been identified. Dedicated tasks will be created accordingly.

 


1 The CRUD workloads test uses the same collection to measure the operation throughput. It runs different types of operations (e.g. delete and insert) to preserve the status of the collection for subsequent test cases, leading the measurement of the execution time for each single type of operation cumbersome and imprecise for our purposes.

Generated at Thu Feb 08 05:51:14 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.