[SERVER-56274] TTL deletes are much slower on descending indexes than ascending indexes Created: 22/Apr/21  Updated: 29/Oct/23  Resolved: 03/Oct/22

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 4.4.0
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Louis Williams Assignee: Backlog - Storage Engines Team
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File Screen Shot 2021-06-01 at 1.56.28 PM.png     PNG File Screen Shot 2021-06-01 at 1.56.35 PM.png     File ttl_delete_repro.js    
Issue Links:
Depends
depends on SERVER-55750 Enable Feature flag for PM-2227 Closed
depends on SERVER-65528 Use range bounded cursors for restori... Closed
Related
related to SERVER-57221 Inconsistent delete performance on co... Closed
related to SERVER-56509 Wrap unique index insertion _keyExist... Closed
is related to SERVER-69659 Test performance of restoring index c... Open
Assigned Teams:
Storage Engines
Backwards Compatibility: Fully Compatible
Operating System: ALL
Sprint: Execution Team 2021-06-28, Execution Team 2021-07-12, Execution Team 2021-08-23, Execution Team 2021-09-06
Participants:
Case:

 Description   

For unknown reasons, TTL deletes that use a descending index (e.g. {lastUse: -1}) take at least an order of magnitude more time than deletes on an equivalent, ascending index (e.g.{lastUse: 1}).

Investigate and attempt to fix this problem.



 Comments   
Comment by Louis Williams [ 03/Oct/22 ]

This performance problem has gone away since the work combined from SERVER-65528 and SERVER-55750.

Comment by Louis Williams [ 01/Dec/21 ]

Hi jenny.ren@mail.utoronto.ca, your analysis seems correct to me. A descending index on time requires the TTL monitor to traverse in reverse order, which means that restoring the cursor after each delete traverses in forward order, having to skip over deleted records. Thanks for the investigation!

Comment by Jen X [ 01/Dec/21 ]

 

Hi, I'd like to share some findings on the root cause.

The following snippet of code is called when repositioning the cursor (in "__wt_btcur_search_near"):

//        
        while ((ret = __wt_btcur_next(cbt, false)) != WT_NOTFOUND) {
            WT_ERR(ret);
            if (btree->type == BTREE_ROW)
                WT_ERR(__wt_compare(session, btree->collator, &cursor->key, &state.key, &exact));
            else
                exact = cbt->recno < state.recno ? -1 : cbt->recno == state.recno ? 0 : 1;
            if (exact >= 0)
                goto done;
        }        
        while ((ret = __wt_btcur_prev(cbt, false)) != WT_NOTFOUND) {
            WT_ERR(ret);
            if (btree->type == BTREE_ROW)
                WT_ERR(__wt_compare(session, btree->collator, &cursor->key, &state.key, &exact));
            else
                exact = cbt->recno < state.recno ? -1 : cbt->recno == state.recno ? 0 : 1;
            if (exact <= 0)
                goto done;
        }

As you can see, the search is hard-coded to search one direction (next) first, and when nothing is found, it searches the reverse direction (prev).

For the ascending index, the cursor moves in the same direction as "__wt_btcur_next", so it finds the next valid record right away and there is no need to call "__wt_btcur_prev".  For descending index, the direction the cursor moves is reversed, but the code still calls "__wt_btcur_next" first only to find deleted records. Then it calls "__wt_btcur_prev" to successfully reposition the cursor, but extra work is wasted.

If you change the code to execute "__wt_btcur_prev" first, the ascending index delete becomes the slow one and descending index delete becomes the fast one.

 

Comment by Henrik Edin [ 15/Sep/21 ]

Assigning to Storage Engines per Louis's comment.

Comment by Louis Williams [ 03/Jun/21 ]

The common theme with this and SERVER-57221 is that cursors are spending a lot of time repositioning their cursors on recently-deleted keys. My guess is that after a delete, the repositioned cursor is traversing all tombstones in the table. As more keys are deleted, more tombstones need to be traversed, making further deletes slower and slower. This does not fully explain the behavior regarding multi-planning in SERVER-57221.

I'm not sure why this only happens on descending indexes. Descending TTL indexes delete keys from the end of an index and traverse in reverse order. Maybe there is some deficiency that traversing tombstones at the end of a table and not the front? But that doesn't explain the behavior in SERVER-57221, which also occurs on forward cursors.

Since Storage Engines is going to scope a project per this comment, I'm going to assign this to their backlog.

Comment by Louis Williams [ 01/Jun/21 ]

I was able to reproduce this regression in the attached test (ttl_delete_repro.js). (Note: I started mongod using --replSet rs --setParameter ttlMonitorSleepSecs=1) The output is as follows:

inserted 10000
starting TTL with ascending index
TTL with ascending index: 1625ms
inserted 10000
starting TTL with descending index
TTL with descending index: 9311ms

This appears to have the same symptoms and unexplained behavior as SERVER-57221. Notably, spending a lot of time in restoreState() and __wt_txn_read_upd_list_internal.

Ascending:

Descending:

Comment by Bruce Lucas (Inactive) [ 22/Apr/21 ]

In my testing I saw 1) a regression in both ascending and descending delete rates from 4.4 to 4.2, but 2) a much larger regression in descending than ascending.

              4.4       4.2
descending    6.5k /s   24k /s
ascending     86k  /s   100 k/s

Generated at Thu Feb 08 05:38:50 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.