[SERVER-21920] Use enhanced WiredTiger next_random cursors for oplog stones Created: 16/Dec/15  Updated: 25/Jun/21  Resolved: 16/Dec/15

Status: Closed
Project: Core Server
Component/s: Storage
Affects Version/s: None
Fix Version/s: 3.2.1, 3.3.0

Type: Improvement Priority: Major - P3
Reporter: Alexander Gorrod Assignee: Max Hirschhorn
Resolution: Done Votes: 0
Labels: code-only
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Related
related to SERVER-55821 remove next_random_sample_size=1000 c... Closed
is related to WT-2262 Random sampling is skewed by tree shape Closed
Backwards Compatibility: Fully Compatible
Backport Completed:
Sprint: QuInt E (01/11/16)
Participants:

 Description   

The basic implementation of WiredTiger random cursors doesn't do a very good job if the tree is unbalanced.

There is an enhanced next_random implementation that gives a pseudo random set of keys distributed somewhat evenly across the file. In order to use the new functionality it's necessary to tell WiredTiger approximately how many samples are going to be taken.



 Comments   
Comment by Githook User [ 16/Dec/15 ]

Author:

{u'username': u'visemet', u'name': u'Max Hirschhorn', u'email': u'max.hirschhorn@mongodb.com'}

Message: SERVER-21920 Use next_random_sample_size when sampling the oplog.

This allows the random cursor to account for skew in the WiredTiger
B-tree when taking samples for oplog stones.

(cherry picked from commit 4463e0366bac5874e4c527b88f25045d544850a5)
Branch: v3.2
https://github.com/mongodb/mongo/commit/27692afcd08165ca8087bff7dd3fae754acfcb84

Comment by Githook User [ 16/Dec/15 ]

Author:

{u'username': u'visemet', u'name': u'Max Hirschhorn', u'email': u'max.hirschhorn@mongodb.com'}

Message: SERVER-21920 Use next_random_sample_size when sampling the oplog.

This allows the random cursor to account for skew in the WiredTiger
B-tree when taking samples for oplog stones.
Branch: master
https://github.com/mongodb/mongo/commit/4463e0366bac5874e4c527b88f25045d544850a5

Comment by Alexander Gorrod [ 16/Dec/15 ]

It's not critical that the specified sample count is accurate, but if it's too high WiredTiger may not sample from the entire file, and if it's too low WiredTiger will probably do more work than is necessary to generate the random samples.

Comment by Alexander Gorrod [ 16/Dec/15 ]

The current code in wiredtiger_record_store.cpp uses:

next_random=true

It needs to use:

next_random=true,next_random_sample_size=XXX

Ideally XXX will match the number of samples you anticipate taking. The WiredTiger API documentation does a better job of explaining than I could:
http://source.wiredtiger.com/develop/struct_w_t___s_e_s_s_i_o_n.html#afb5b4a69c2c5cafe411b2b04fdc1c75d

Generated at Thu Feb 08 03:58:49 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.