[SERVER-38211] Optimise duplicate check algorithm for timestamp safe unique index Created: 20/Nov/18 Updated: 09/Aug/22 Resolved: 03/Jun/19 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Storage |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Sulabh Mahajan | Assignee: | Sulabh Mahajan |
| Resolution: | Won't Fix | Votes: | 0 |
| Labels: | storage-engines | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||
| Issue Links: |
|
||||
| Sprint: | Storage Engines 2019-05-20, Storage Engines 2019-06-03 | ||||
| Participants: | |||||
| Case: | (copied to CRM) | ||||
| Linked BF Score: | 0 | ||||
| Story Points: | 8 | ||||
| Description |
|
Timestamp safe unique index for 4.2 incorporates a duplicate key check/prevention algorithm. Older Unique index used to prevent duplicate keys in a different way. We put this algorithm to be timestamp safe. Linkbench (LOAD_NODE_BULK) shows a performance penalty due to this check. This ticket tracks the effort in optimising the algorithm and get the performance back. |
| Comments |
| Comment by Sulabh Mahajan [ 03/Jun/19 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
After discussing with the team and as recommended by brian.lane, I am marking this ticket as won't fix. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Sulabh Mahajan [ 03/Jun/19 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I tested the following: Instead of 1. insert 2. remove 3. search the prefix, I reconfigured the WT cursor to overwrite=true and then changed the steps to 1. update 2. remove 3. searching the prefix. This would keep the cursor positioned after the first step and hence avoid searching the key again for the removal. At this point, I think there are no quick improvements that we can do to improve the performance. As Alex said, we would need a major re-design and change that is not worth the potential gain. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Brian Lane [ 03/Jun/19 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Agreed, sulabh.mahajan let me know how things progress. Cheers! -Brian | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Alexander Gorrod [ 03/Jun/19 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Thanks for the details sulabh.mahajan. We did have a design that utilized the WiredTiger reserve operation at some stage, but realized that it wouldn't be enough since reserve operations protect concurrent transactions, but not for the lifetime of a snapshot transaction. That's due to a decision about how we handle WT_UPDATE structures associated with reserve operations after a transaction commits or aborts. We could revisit the semantics of reserve in WiredTiger to make it suitable for the behavior desired here, which would mean an explicit remove is no longer required. I don't feel as though the potential performance gain justifies the engineering effort required to design and implement that change at the moment. brian.lane we could use your help in making that decision. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Sulabh Mahajan [ 03/Jun/19 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
The duplicate check algorithm that we put in for the new unique indexes comprises of 3 stages: insert a prefix key, remove the prefix key and check for an existing duplicate record with the same prefix as the prefix key. The 3 stages of the check introduce the following amounts of regression with high batch size (250+ docs per batch) bulk inserts:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Sulabh Mahajan [ 30/May/19 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
To conclude, the regression is mostly limited to:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Sulabh Mahajan [ 30/May/19 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I spent considerable time on this ticket today. I studied linkbench and constructed a workload that would be a bit similar while still being simple and relevant to the problem at hand. Attached workload:
The whole test is run with FCV 4.0 and then FCV 4.2 to compare between the old and new unique indexes. Here is the sample output:
Here are some results in a form easier to understand:
From what I have read, MongoDB's driver for linkbench uses a 1000 doc batch size for bulk inserts, and does those inserts for the node collection through a single thread. I further tested batch inserts with various batch sizes. The regression for batch insert was 15-20% across and I didn't see a strong correlation with the batch size:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Sulabh Mahajan [ 29/May/19 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
I am comparing the new index with the duplicate check vs new index without the duplicate check. From the sys-perf runs, I established earlier that disabling the duplicate check brings the performance of the new index back to the former old index levels.
I attached the tests to the ticket. With FCV 4.0 : 5316 ms I am studying linkbench workload now, it is probably different, so might get more inputs from there. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Alexander Gorrod [ 29/May/19 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Thanks for the update sulabh.mahajan I'm surprised that the difference is introduced when there are a lot of different unique indexes. I think that bears further investigation. Could you clarify what you are comparing here? Is it a newer and older version of MongoDB? | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Sulabh Mahajan [ 29/May/19 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Status Update:
I do not understand linkbench load phase yet, but I think the regression is most likely a factor of the number of unique indexes a document insert has to deal with. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Sulabh Mahajan [ 07/May/19 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
alexander.gorrod, I can look at it this sprint. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Alexander Gorrod [ 07/May/19 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
sulabh.mahajan do you have the capacity to pick this up and get it done in the sprint that just started? | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Alexander Gorrod [ 07/May/19 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
GA - I've updated the fix version | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Brian Lane [ 25/Jan/19 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
FYI - This issue came up in the monthly performance review today. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Alexander Gorrod [ 24/Jan/19 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I've taken the fix version out - we should revisit the priority before the 4.2 release. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Sulabh Mahajan [ 29/Nov/18 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Summarising from the BF ticket: In the test that I ran:
|