[SERVER-22069] Better seeding of WT's random number generator Created: 05/Jan/16  Updated: 06/Apr/23  Resolved: 09/May/18

Status: Closed
Project: Core Server
Component/s: WiredTiger
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Charlie Swanson Assignee: Backlog - Storage Execution Team
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File Screen Shot 2016-01-12 at 2.54.52 PM.png    
Issue Links:
Backports
Duplicate
duplicates WT-3604 Review non-seeded random number usage Backlog
Related
Assigned Teams:
Storage Execution
Backport Requested:
v3.4, v3.2
Participants:

 Description   

Currently each WiredTiger session has its own random number generator, and they use the same seed. This is not ideal, and makes it very difficult to describe the guarantees of random operations such as aggregation's $sample (which can use a random cursor).

I'm not sure what the best way is to fix it, but I think we should do something about it.



 Comments   
Comment by Keith Bostic (Inactive) [ 27/Jan/16 ]

charlie.swanson: this change isn't yet in master (or even develop). I'll try to let you know when it goes into master, unfortunately, when that happens is pretty unscheduled.

I used an already existing PRNG state, after moving it some number of steps along, as each new session's initial state.

There's some disagreement and discussion in the linked branch, if you're interested.

Comment by Charlie Swanson [ 27/Jan/16 ]

Great! Thanks keith.bostic.

Is this change in master yet? If not, can you let me know when it is so I can try it out?

Out of curiosity, what strategy did you end up using to pick the initial state?

Comment by Keith Bostic (Inactive) [ 21/Jan/16 ]

charlie.swanson, I've pushed a branch that changes WiredTiger session initialization so sessions don't start with the same initial PRNG state. In my testing it cleans up your test case, giving me about 95% unique returns.

Comment by Keith Bostic (Inactive) [ 19/Jan/16 ]

There is no code to ensure a uniform distribution; all I meant by "returns uniform results across the space" was that our testing indicates it's a fair/balanced way to select random small numbers (between 0 and 20, for example), since that's WiredTiger's primary use of random numbers.

Comment by Dissatisfied Former User [ 19/Jan/16 ]

Howdy!

Indeed, [1,1,1] is just as "random" and "likely" as [1,0,4] and [1,2,3], etc. (Seeing a pattern in a true random selection is just our brain tricking us via the Monte Carlo fallacy.)

If you have code to ensure a uniform distribution, i.e. by checking and rejecting new candidate values if they have been seen recently, that introduces a statistical bias towards even distribution which is not random. I.e. having generated [1, 3, 5] there may be a zero percent chance (worst case ) of seeing a 5 repeated as the next selection making the generation more predictable.

Randomness of PRNG results is all about the next result being as unpredictable as possible, thus avoiding limiting rules like that. The quote "…WiredTiger PRNG doesn't cycle and returns uniform results across the space" would weakly imply that such limiting rules are in place, thus my concern beyond the existing seeding ones otherwise described by this ticket.

Comment by Charlie Swanson [ 19/Jan/16 ]

Hi Alice,

I'm confused by the following assertion:

Uniform distribution is also un-random. 1,1,1,1,1,1,1,1,1 is just as likely a true random sequence as any other, despite apparent bias.

I think I know what you mean by "1,1,1,1,1,1,1,1,1 is just as likely a true random sequence as any other", and I think I'd agree. e.g. if we use a uniform distribution to select 3 numbers from 0 to 10, with replacement, then [1,1,1] is just as likely as [1,0,4].

I don't understand the assertion that "Uniform distribution is also un-random". Do you mean that a sequence generated by a uniform distribution can appear not to be random?

Comment by Dissatisfied Former User [ 19/Jan/16 ]

> We've done quite a bit of testing to confirm the WiredTiger PRNG doesn't cycle and returns uniform results across the space…

As I noted on IRC, I find this a touch worrying. Uniform distribution is also un-random. 1,1,1,1,1,1,1,1,1 is just as likely a true random sequence as any other, despite apparent bias.

SERVER-22225 is related and may offer a possible direction for resolution to this ticket.

Comment by Charlie Swanson [ 13/Jan/16 ]

I'll add a follow up to help clarify why there are 8 WT cursors, and why we needed to use multiple threads.

After consulting with the Storage Integrations team, it looks like we use a pool of WT sessions, and only ever create a new one if all session are in use. So we need to use multiple threads in order to get multiple sessions. Each session will use the same random number generator, with the same initial seed, and this leads to the problem described above.

Comment by Charlie Swanson [ 12/Jan/16 ]

Hi keith.bostic, of course.

I made a fake concurrency test to demonstrate the problem. The full test is at the bottom of this comment, but the summary is that we start 10 threads, each doing the following pipeline, and print which _id they found (out of 1000):

db.coll.aggregate([{$sample: {size: 1}}])

This will establish a random cursor in WT, and do a random walk down the tree to find a pseudo-random document. Attached is the distribution of _ids returned (hand generated, it's not easy to see this from the test output). You can see, many of them are returned up to 8 times.

I believe this is because under concurrent load, we will open a new WT session, which will be seeded the same as the other ones, so we end up with 8 WT sessions open, each of which will return the same results.

Does that help explain why this might be undesirable?

Full test:

'use strict';
 
/**
 * agg_sample.js
 *
 * Based off of agg_base.js, just issues a bunch of $sample pipelines, and outputs their results.
 */
var $config = (function() {
 
    var data = {
        numDocs: 1000,
    };
 
    var states = {
        query: function query(db, collName) {
            var res = db[collName].aggregate([{$sample: {size: 1}}]).toArray();
            jsTestLog(tojson(res));
        }
    };
 
    var transitions = {
        query: { query: 1 }
    };
 
    function setup(db, collName, cluster) {
        // load example data
        var bulk = db[collName].initializeUnorderedBulkOp();
        for (var i = 0; i < this.numDocs; ++i) {
            bulk.insert({_id: i});
        }
        var res = bulk.execute();
        assertWhenOwnColl.writeOK(res);
        assertWhenOwnColl.eq(this.numDocs, res.nInserted);
        assertWhenOwnColl.eq(this.numDocs, db[collName].find().itcount());
    }
 
    function teardown(db, collName, cluster) {
        assertWhenOwnColl(db[collName].drop());
    }
 
    return {
        threadCount: 10,
        iterations: 10,
        states: states,
        startState: 'query',
        transitions: transitions,
        data: data,
        setup: setup,
        teardown: teardown
    };
})();

Comment by Keith Bostic (Inactive) [ 12/Jan/16 ]

charlie.swanson, can you talk more about the problem you've seen?

We've done quite a bit of testing to confirm the WiredTiger PRNG doesn't cycle and returns uniform results across the space – it would be good to get a test case that demonstrates the problem.

In general, as you noted in HELP-1638, the WiredTiger PRNG uses known good seeds; the problem with seeding the PRNG in some per-session way is that seeding a PRNG can make them behave worse, not better.

Generated at Thu Feb 08 03:59:18 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.