[SERVER-4575] two somewhat redundant procedures for updating btree cursors when a btree bucket is deallocated; invalid debug warning Created: 28/Dec/11  Updated: 06/Dec/22  Resolved: 14/Sep/18

Status: Closed
Project: Core Server
Component/s: Index Maintenance, MMAPv1
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Aaron Staple Assignee: Backlog - Storage Execution Team
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to SERVER-3803 btree cursor check location / key at ... Closed
Assigned Teams:
Storage Execution
Participants:

 Description   

It looks like we currently have two methods for updating btree cursors when a btree bucket is deallocated.

Method 1: If a ClientCursor with a non null refLoc has been configured for the BtreeCursor, bucket deallocation will trigger a callback to the cursor and invalidate its keyOfs property, causing the cursor to relocate its position when checkLocation() is called.

Method 2: When a bucket is deallocated its n value is set to -1, causing the cursor to relocate its position when checkLocation() is called. If the bucket is reused for the same btree before checkLocation() gets called, checkLocation() will still do the right thing. In order for Method 2 to work correctly, the data at the bucket's disk loc must be either an allocated bucket within the correct btree, or the bit representation of a deallocated bucket. In particular:

  • The deallocated bucket's disk loc must not point to a non bucket (the middle of a bucket or certain types of unallocated regions in the extent).
  • The extent containing the old bucket must not be unmapped or reused for another index or collection.

Method 1 is employed when yielding and in some but not all update and remove operations. The bucket resetting portion of Method 2 is employed in all cases, but the consequences of this behavior only matter when Method 1 is not employed.

I think it's possible that Method 2 was designed with the idea that an update or delete operation would not cause a btree bucket allocation, but that is not a correct assumption. Both update and delete can cause a btree bucket allocation.

I would recommend the following todos:

  • Figure out if the assumptions relating to bucket deallocation for Method 2 are valid in our implementation and will remain valid in the future.
  • Decide if we want to stick with just Method 1 or just Method 2 rather than continue to use both. Method 1 is a bit more heavy weight because it requires iterating through all cursors when a bucket is deleted. Method 2 has some potentially fragile dependencies on handling of deallocated buckets and the record storage engine.
  • If we decide to go with just one method, improve it if necessary to fill the gaps currently handled by the other method. (If we pick Method 1 we would need to utilize it for all updates and deletes.)
  • If necessary, remove any improper debugging asserts or logs. For example the current:

                DEV tlog() << "debug warning: no cursors found in informAboutToDeleteBucket()" << endl;

    is incorrect because it assumes Method 1 is the only bucket deletion handling method in use.



 Comments   
Comment by Aaron Staple [ 30/Dec/11 ]

Just wanted to add an additional note about how a delete operation can trigger allocation or even reuse of a bucket, since it's not obvious. A delete operation may cause removal of an index key, causing two adjacent buckets to merge (and deallocating one of them). This will in turn remove a key from the parent of the merged bucket, which may cause the parent to rebalance with its neighbor. This rebalance may cause a key replacement in the parent of those buckets, and the new key may be too large to fit inside the bucket, causing a split and necessitating a bucket allocation. So a single delete can cause both a bucket allocation and deallocation (and in simpler cases just an allocation or just a deallocation).

Generated at Thu Feb 08 03:06:23 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.