[SERVER-4577] output of padding factor is not deterministic, uses globals Created: 29/Dec/11  Updated: 23/Feb/17  Resolved: 23/Feb/17

Status: Closed
Project: Core Server
Component/s: Internal Code
Affects Version/s: 2.0.2
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Antoine Girbal Assignee: Unassigned
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Participants:

 Description   

test padding.js fails for me quite often when using parallel tests.
Running single test seems to pass fairly reliably.
Failure is for line 61:
assert(p.stats().paddingFactor > ps + 0.05);

See below for padding numbers during tests.
Basically it seems that the padding numbers are close but narrowly miss the target.
Is there a chance that padding is affected by other things happening on global / db / other collection?
Or maybe test runs slower in parallel and doesnt hit the threshold, seems like the room for error is small.

Good run:
1
1.1000000000000036
1.1980000000000075
1.3010000000000113
1.400000000000015
1.4980000000000189
1.6010000000000226
1.7000000000000264
1.7980000000000302
1.901000000000034
1.9960000000000377
1.9620000000000415
1.9290000000000451
1.8960000000000488
1.8620000000000525
1.8290000000000561
1.7960000000000598
1.7620000000000635
1.7290000000000671
1.6960000000000708
1.6620000000000745
1.6930000000000782
1.728000000000082
1.7580000000000857
1.7570000000000894
1.760000000000093
1.7580000000000968
1.7570000000001005
1.7600000000001041
1.758000000000108
1.7570000000001116
1.7510000000001154
1.738000000000119
1.7290000000001227
1.7150000000001264
1.70600000000013
1.7210000000001338
1.7560000000001375
1.7860000000001413
1.813000000000145

Bad run:
1
1.105000000000003
1.2040000000000068
1.3060000000000107
1.4020000000000148
1.4940000000000193
1.5930000000000235
1.6920000000000273
1.7900000000000311
1.8940000000000348
1.9920000000000386
1.9590000000000423
1.925000000000046
1.8920000000000496
1.8580000000000534
1.826000000000057
1.7920000000000607
1.761000000000064
1.7270000000000678
1.6950000000000713
1.661000000000075
1.6930000000000787
1.7330000000000818
1.7660000000000853
1.7670000000000887
1.7680000000000922
1.7710000000000958
1.774000000000099
1.7750000000001025
1.780000000000106
1.7790000000001096
1.7750000000001127
1.7660000000001164
1.7570000000001196
1.7450000000001236
1.7470000000001256
1.7660000000001292
1.797000000000133
1.813000000000137
1.8160000000001406



 Comments   
Comment by auto [ 02/Feb/12 ]

Author:

{u'login': u'agirbal', u'name': u'agirbal', u'email': u'antoine@10gen.com'}

Message: SERVER-4577: removing padding.js from parallel tests until bug is fixed
Branch: master
https://github.com/mongodb/mongo/commit/4c0962e586bff51433afe658c26a94ee30c4f78c

Comment by Aaron Staple [ 17/Jan/12 ]

Haven't investigated any more since last comment, but the more lenient test is now failing some of the time

<http://buildbot.mongodb.org/builders/Linux%2064-bit%20v8/builds/2918/steps/test_5/logs/stdio>
<http://buildbot.mongodb.org/builders/Linux%2064-bit%20v8/builds/2917/steps/test_5/logs/stdio>

Comment by Aaron Staple [ 29/Dec/11 ]

I took a look through some data for 6 or so old runs of this test, and I think this data supports my theory above. (Unfortunately the test doesn't print both numbers that are actually checked, but I think it prints enough info to get an accurate feel for what's going on.)

In the core suite, the results were nearly identical for all runs.

In the parallel basic suite there was more variability. For the numbers I could see, the mean difference at the last part of the test was .0523 and the standard deviation was .00468. The parallel basicPlus suite probably has higher variance though I didn't look at the numbers.

So I think lowering the threshold for this test is fine.

Comment by auto [ 29/Dec/11 ]

Author:

{u'login': u'agirbal', u'name': u'agirbal', u'email': u'antoine@10gen.com'}

Message: SERVER-4577: make test lineant
Branch: master
https://github.com/mongodb/mongo/commit/d97a9442b4a0ceb98ea26a981bf49d05974c77b8

Comment by Antoine Girbal [ 29/Dec/11 ]

tweaking padding.js to pass more reliably

Comment by Antoine Girbal [ 29/Dec/11 ]

from Aaron:

Not sure about what's happening but here's a guess. It looks like
NamespaceDetails::paddingFits() and
NamespaceDetails::paddingTooSmall() use globals via MONGO_SOMETIMES to
throttle the frequency with which the functions actually take action
with respect to padding. If the padding test is run in isolation, its
behavior is pretty deterministic. The variation in behavior of
paddingFits and paddingTooSmall should only depend on the initial
values of those global variables (from other tests that ran before
padding.js). However, if other operations are running interleaved
with the padding.js operations there will be some additional variation
in when paddingFits and paddingTooSmall perform their prescribed
functions. And this variation may be enough to push the test past the
potentially tight threshold where it passes.

       /* called to indicate that an update fit in place.
          fits also called on an insert -- idea there is that if you
had some mix and then went to
          pure inserts it would adapt and PF would trend to 1.0.
note update calls insert on a move
          so there is a double count there that must be adjusted for below.
 
          todo: greater sophistication could be helpful and added
later.  for example the absolute
                size of documents might be considered -- in some
cases smaller ones are more likely
                to grow than larger ones in the same collection? (not always)
       */
       void paddingFits() {
           MONGO_SOMETIMES(sometimes, 4) { // do this on a sampled
basis to journal less
               double x = paddingFactor - 0.001;
               if ( x >= 1.0 ) {
                   *getDur().writing(&paddingFactor) = x;
                   //getDur().setNoJournal(&paddingFactor, &x, sizeof(x));
               }
           }
       }
       void paddingTooSmall() {
           MONGO_SOMETIMES(sometimes, 4) { // do this on a sampled
basis to journal less
               /* the more indexes we have, the higher the cost of a
move.  so we take that into
                  account herein.  note on a move that insert() calls
paddingFits(), thus
                  here for example with no inserts and nIndexes = 1 we have
                  .001*4-.001 or a 3:1 ratio to non moves -> 75%
nonmoves.  insert heavy
                  can pushes this down considerably. further tweaking
will be a good idea but
                  this should be an adequate starting point.
               */
               double N = min(nIndexes,7) + 3;
               double x = paddingFactor + (0.001 * N);
               if ( x <= 2.0 ) {
                   *getDur().writing(&paddingFactor) = x;
                   //getDur().setNoJournal(&paddingFactor, &x, sizeof(x));

Generated at Thu Feb 08 03:06:23 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.