[SERVER-38211] Optimise duplicate check algorithm for timestamp safe unique index Created: 20/Nov/18  Updated: 09/Aug/22  Resolved: 03/Jun/19

Status: Closed
Project: Core Server
Component/s: Storage
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Sulabh Mahajan Assignee: Sulabh Mahajan
Resolution: Won't Fix Votes: 0
Labels: storage-engines
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File batch_ins.py     File linkbench_simu.py     File uniq_idx.py     File uniq_idx.sh    
Issue Links:
Depends
Sprint: Storage Engines 2019-05-20, Storage Engines 2019-06-03
Participants:
Case:
Linked BF Score: 0
Story Points: 8

 Description   

Timestamp safe unique index for 4.2 incorporates a duplicate key check/prevention algorithm. Older Unique index used to prevent duplicate keys in a different way. We put this algorithm to be timestamp safe.

Linkbench (LOAD_NODE_BULK) shows a performance penalty due to this check. This ticket tracks the effort in optimising the algorithm and get the performance back.



 Comments   
Comment by Sulabh Mahajan [ 03/Jun/19 ]

After discussing with the team and as recommended by brian.lane, I am marking this ticket as won't fix.

Comment by Sulabh Mahajan [ 03/Jun/19 ]

I tested the following:

Instead of 1. insert 2. remove 3. search the prefix, I reconfigured the WT cursor to overwrite=true and then changed the steps to 1. update 2. remove 3. searching the prefix. This would keep the cursor positioned after the first step and hence avoid searching the key again for the removal.
Though I saw a small improvement in the tests I created for this ticket, I did not see any improvements in sys-perf / linkbench.

At this point, I think there are no quick improvements that we can do to improve the performance. As Alex said, we would need a major re-design and change that is not worth the potential gain.

Comment by Brian Lane [ 03/Jun/19 ]

Agreed, sulabh.mahajan let me know how things progress.

Cheers!

-Brian

Comment by Alexander Gorrod [ 03/Jun/19 ]

Thanks for the details sulabh.mahajan.

We did have a design that utilized the WiredTiger reserve operation at some stage, but realized that it wouldn't be enough since reserve operations protect concurrent transactions, but not for the lifetime of a snapshot transaction. That's due to a decision about how we handle WT_UPDATE structures associated with reserve operations after a transaction commits or aborts.

We could revisit the semantics of reserve in WiredTiger to make it suitable for the behavior desired here, which would mean an explicit remove is no longer required.

I don't feel as though the potential performance gain justifies the engineering effort required to design and implement that change at the moment. brian.lane we could use your help in making that decision.

Comment by Sulabh Mahajan [ 03/Jun/19 ]

The duplicate check algorithm that we put in for the new unique indexes comprises of 3 stages: insert a prefix key, remove the prefix key and check for an existing duplicate record with the same prefix as the prefix key. The 3 stages of the check introduce the following amounts of regression with high batch size (250+ docs per batch) bulk inserts:

Stage Regression
Insert prefix key 3 %
Remove prefix key 13 %
Check existing record 8 %
Comment by Sulabh Mahajan [ 30/May/19 ]

To conclude, the regression is mostly limited to:

  • Batch insert (most likely affecting linkbench)
  • Collections with several unique indexes
Comment by Sulabh Mahajan [ 30/May/19 ]

I spent considerable time on this ticket today. I studied linkbench and constructed a workload that would be a bit similar while still being simple and relevant to the problem at hand.

Attached workload:

  • Creates 3 tables with a different set of compound unique indexes, one on each
  • Then it creates N (21) threads to do the following:
  • * 1 thread inserts in one of the tables. The inserts are done through bulk load API, an X (5) number of batches with Y (5000) documents per batch
  • * N - 1 (20) threads upserts in the remaining two tables. Again this is done through bulk load API, N*X batches (25) with Y (5000) documents per batch
  • Following are computed: total time taken, average of time taken to write a single batch
  • The test runs multiple times and averages the values to remove fluctuations.

The whole test is run with FCV 4.0 and then FCV 4.2 to compare between the old and new unique indexes.

Here is the sample output:

------------- old uniq idx --------------
Running load...
Round  0  load time:  34180
Round  0  Res: [160, 1092, 1369, 1098, 1362, 1094, 1360, 1078, 1376, 1055, 1349, 1117, 1341, 1105, 1377, 1103, 1380, 1074, 1331, 1070, 1345]
Round  0  Avg ins time: 160
Round  0  Avg upd time: 1223
Round  1  load time:  34824
Round  1  Res: [166, 1077, 1356, 1134, 1375, 1069, 1369, 1071, 1342, 1095, 1362, 1119, 1410, 1082, 1329, 1097, 1368, 1102, 1374, 1099, 1405]
Round  1  Avg ins time: 166
Round  1  Avg upd time: 1231
Round  2  load time:  33944
Round  2  Res: [160, 1118, 1350, 1095, 1327, 1078, 1358, 1038, 1354, 1087, 1321, 1114, 1361, 1067, 1369, 1086, 1336, 1071, 1358, 1095, 1367]
Round  2  Avg ins time: 160
Round  2  Avg upd time: 1217
Avg for test:
   Loading time:  34316
   ins time:  162
   upd time:  1223
------------- new uniq idx --------------
Running load...
Round  0  load time:  36486
Round  0  Res: [218, 1172, 1453, 1209, 1458, 1190, 1456, 1143, 1448, 1201, 1447, 1150, 1451, 1165, 1431, 1197, 1474, 1196, 1430, 1162, 1437]
Round  0  Avg ins time: 218
Round  0  Avg upd time: 1313
Round  1  load time:  36711
Round  1  Res: [210, 1199, 1460, 1198, 1456, 1185, 1471, 1186, 1480, 1178, 1481, 1170, 1488, 1220, 1433, 1183, 1467, 1146, 1457, 1180, 1443]
Round  1  Avg ins time: 210
Round  1  Avg upd time: 1324
Round  2  load time:  36693
Round  2  Res: [208, 1193, 1484, 1149, 1464, 1189, 1482, 1163, 1462, 1216, 1449, 1225, 1473, 1172, 1470, 1159, 1479, 1177, 1475, 1135, 1461]
Round  2  Avg ins time: 208
Round  2  Avg upd time: 1323
Avg for test:
   Loading time:  36630
   ins time:  212
   upd time:  1320

Here are some results in a form easier to understand:

5000 doc batch insert 5000 doc batch upsert 1000 doc batch insert 1000 doc batch upsert
FCV 4.0 (old uniq idx) 162 1223 35 240
FCV 4.2 (new uniq idx) 212 1320 41 260
Regression (%) 31 8 17 8

From what I have read, MongoDB's driver for linkbench uses a 1000 doc batch size for bulk inserts, and does those inserts for the node collection through a single thread.

I further tested batch inserts with various batch sizes. The regression for batch insert was 15-20% across and I didn't see a strong correlation with the batch size:

------------- old uniq idx ---------------
"batch-size : runtime" :  [[1, 0], [251, 6], [501, 9], [751, 15], [1001, 19], [1251, 25], [1501, 28], [1751, 33], [2001, 37], [2251, 42], [2501, 46], [2751, 49], [3001, 52], [3251, 57], [3501, 60], [3751, 65], [4001, 71], [4251, 75], [4501, 79], [4751, 84], [5001, 86]]
------------- new uniq idx --------------
"batch-size : runtime" :  [[1, 1], [251, 6], [501, 13], [751, 19], [1001, 25], [1251, 30], [1501, 34], [1751, 40], [2001, 46], [2251, 51], [2501, 55], [2751, 59], [3001, 65], [3251, 71], [3501, 75], [3751, 81], [4001, 85], [4251, 90], [4501, 91], [4751, 103], [5001, 104]]
-----------------------------------------

Comment by Sulabh Mahajan [ 29/May/19 ]

Could you clarify what you are comparing here? Is it a newer and older version of MongoDB?

I am comparing the new index with the duplicate check vs new index without the duplicate check. From the sys-perf runs, I established earlier that disabling the duplicate check brings the performance of the new index back to the former old index levels.

I can check my results with several old unique indexes to be sure.
I checked. I confirm the results hold good between old unique index and new unique index. So if there are a large number of unique indexes insert performance drops significantly.

I attached the tests to the ticket.
I am measuring the time required for 10k inserts in 5 collections each with 15 unique indexes per collection (being done through 5 parallel threads). I run the test twice, once with FCV 4.0 (old unique index) and then with FCV 4.2 (new unique index).

With FCV 4.0 : 5316 ms
With FCV 4.2 : 8751 ms
Regression : 65%

I am studying linkbench workload now, it is probably different, so might get more inputs from there.

Comment by Alexander Gorrod [ 29/May/19 ]

Thanks for the update sulabh.mahajan I'm surprised that the difference is introduced when there are a lot of different unique indexes. I think that bears further investigation. Could you clarify what you are comparing here? Is it a newer and older version of MongoDB?

Comment by Sulabh Mahajan [ 29/May/19 ]

Status Update:

  • Created a workload that inserts N (10000) documents in 5 collections through 5 parallel threads. Collections have a unique index on them.
  • I could not see any regression until I increased the number of unique indexes per collection. The more the number of unique indexes on a collection, the larger the regression
  • Eg: For 5 unique indexes per collection roughly 10% regression. For 15 unique indexes regression increases to 40-55%.

I do not understand linkbench load phase yet, but I think the regression is most likely a factor of the number of unique indexes a document insert has to deal with.

Comment by Sulabh Mahajan [ 07/May/19 ]

alexander.gorrod, I can look at it this sprint.

Comment by Alexander Gorrod [ 07/May/19 ]

sulabh.mahajan do you have the capacity to pick this up and get it done in the sprint that just started?

Comment by Alexander Gorrod [ 07/May/19 ]

GA - I've updated the fix version

Comment by Brian Lane [ 25/Jan/19 ]

FYI - This issue came up in the monthly performance review today.

Comment by Alexander Gorrod [ 24/Jan/19 ]

I've taken the fix version out - we should revisit the priority before the 4.2 release.

Comment by Sulabh Mahajan [ 29/Nov/18 ]

Summarising from the BF ticket:

In the test that I ran:

  • Almost all of the appx 55% regression comes from the duplicate check algorithm in the new index.
  • the keystring matching check for supporting the mixed indexes causes less than 5% of regression.
Generated at Thu Feb 08 04:48:16 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.