[SERVER-22069] Better seeding of WT's random number generator Created: 05/Jan/16 Updated: 06/Apr/23 Resolved: 09/May/18 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | WiredTiger |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Charlie Swanson | Assignee: | Backlog - Storage Execution Team |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||||||||||
| Issue Links: |
|
||||||||||||||||
| Assigned Teams: |
Storage Execution
|
||||||||||||||||
| Backport Requested: |
v3.4, v3.2
|
||||||||||||||||
| Participants: | |||||||||||||||||
| Description |
|
Currently each WiredTiger session has its own random number generator, and they use the same seed. This is not ideal, and makes it very difficult to describe the guarantees of random operations such as aggregation's $sample (which can use a random cursor). I'm not sure what the best way is to fix it, but I think we should do something about it. |
| Comments |
| Comment by Keith Bostic (Inactive) [ 27/Jan/16 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||
|
charlie.swanson: this change isn't yet in master (or even develop). I'll try to let you know when it goes into master, unfortunately, when that happens is pretty unscheduled. I used an already existing PRNG state, after moving it some number of steps along, as each new session's initial state. There's some disagreement and discussion in the linked branch, if you're interested. | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Charlie Swanson [ 27/Jan/16 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Great! Thanks keith.bostic. Is this change in master yet? If not, can you let me know when it is so I can try it out? Out of curiosity, what strategy did you end up using to pick the initial state? | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Keith Bostic (Inactive) [ 21/Jan/16 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||
|
charlie.swanson, I've pushed a branch that changes WiredTiger session initialization so sessions don't start with the same initial PRNG state. In my testing it cleans up your test case, giving me about 95% unique returns. | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Keith Bostic (Inactive) [ 19/Jan/16 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||
|
There is no code to ensure a uniform distribution; all I meant by "returns uniform results across the space" was that our testing indicates it's a fair/balanced way to select random small numbers (between 0 and 20, for example), since that's WiredTiger's primary use of random numbers. | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Dissatisfied Former User [ 19/Jan/16 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Howdy! Indeed, [1,1,1] is just as "random" and "likely" as [1,0,4] and [1,2,3], etc. (Seeing a pattern in a true random selection is just our brain tricking us via the Monte Carlo fallacy.) If you have code to ensure a uniform distribution, i.e. by checking and rejecting new candidate values if they have been seen recently, that introduces a statistical bias towards even distribution which is not random. I.e. having generated [1, 3, 5] there may be a zero percent chance (worst case Randomness of PRNG results is all about the next result being as unpredictable as possible, thus avoiding limiting rules like that. The quote "…WiredTiger PRNG doesn't cycle and returns uniform results across the space" would weakly imply that such limiting rules are in place, thus my concern beyond the existing seeding ones otherwise described by this ticket. | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Charlie Swanson [ 19/Jan/16 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hi Alice, I'm confused by the following assertion:
I think I know what you mean by "1,1,1,1,1,1,1,1,1 is just as likely a true random sequence as any other", and I think I'd agree. e.g. if we use a uniform distribution to select 3 numbers from 0 to 10, with replacement, then [1,1,1] is just as likely as [1,0,4]. I don't understand the assertion that "Uniform distribution is also un-random". Do you mean that a sequence generated by a uniform distribution can appear not to be random? | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Dissatisfied Former User [ 19/Jan/16 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||
|
> We've done quite a bit of testing to confirm the WiredTiger PRNG doesn't cycle and returns uniform results across the space… As I noted on IRC, I find this a touch worrying. Uniform distribution is also un-random. 1,1,1,1,1,1,1,1,1 is just as likely a true random sequence as any other, despite apparent bias. SERVER-22225 is related and may offer a possible direction for resolution to this ticket. | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Charlie Swanson [ 13/Jan/16 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I'll add a follow up to help clarify why there are 8 WT cursors, and why we needed to use multiple threads. After consulting with the Storage Integrations team, it looks like we use a pool of WT sessions, and only ever create a new one if all session are in use. So we need to use multiple threads in order to get multiple sessions. Each session will use the same random number generator, with the same initial seed, and this leads to the problem described above. | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Charlie Swanson [ 12/Jan/16 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hi keith.bostic, of course. I made a fake concurrency test to demonstrate the problem. The full test is at the bottom of this comment, but the summary is that we start 10 threads, each doing the following pipeline, and print which _id they found (out of 1000):
This will establish a random cursor in WT, and do a random walk down the tree to find a pseudo-random document. Attached is the distribution of _ids returned (hand generated, it's not easy to see this from the test output). You can see, many of them are returned up to 8 times. I believe this is because under concurrent load, we will open a new WT session, which will be seeded the same as the other ones, so we end up with 8 WT sessions open, each of which will return the same results. Does that help explain why this might be undesirable? Full test:
| ||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Keith Bostic (Inactive) [ 12/Jan/16 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||
|
charlie.swanson, can you talk more about the problem you've seen? We've done quite a bit of testing to confirm the WiredTiger PRNG doesn't cycle and returns uniform results across the space – it would be good to get a test case that demonstrates the problem. In general, as you noted in HELP-1638, the WiredTiger PRNG uses known good seeds; the problem with seeding the PRNG in some per-session way is that seeding a PRNG can make them behave worse, not better. |