data_generator: Completely identical values are generated in correlated columns

XMLWordPrintableJSON

    • Type: Improvement
    • Resolution: Unresolved
    • Priority: Minor - P4
    • None
    • Affects Version/s: None
    • Component/s: None
    • None
    • Query Optimization
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      The values in correlated columns are not merely correlated, they are completely identical if they are generated the same way.

      To reproduce:

      Use this spec:

      @dataclasses.dataclass
      class C:
          f1: Specification(source=correlation(lambda: global_faker().pyint(max_value=10), "a"))
          f2: Specification(source=correlation(lambda: global_faker().pyint(max_value=10), "a"))
      

      and run:

      python src/mongo/db/query/benchmark/data_generator/driver.py --uri "mongodb://localhost:20000/" --db test --size 100 --seed 1 --serial-inserts --drop --analyze specs.test_correlations C
      

      and you will get:

      Enterprise test> db.C.find();
      [
        { _id: ObjectId('690db5c61189c0dc6de2a7ab'), f1: 5, f2: 5 },
        { _id: ObjectId('690db5c61189c0dc6de2a7ac'), f1: 1, f2: 1 },
        { _id: ObjectId('690db5c61189c0dc6de2a7ad'), f1: 3, f2: 3 },
        { _id: ObjectId('690db5c61189c0dc6de2a7ae'), f1: 0, f2: 0 },
        { _id: ObjectId('690db5c61189c0dc6de2a7af'), f1: 10, f2: 10 },
        { _id: ObjectId('690db5c61189c0dc6de2a7b0'), f1: 7, f2: 7 },
        { _id: ObjectId('690db5c61189c0dc6de2a7b1'), f1: 0, f2: 0 },
        { _id: ObjectId('690db5c61189c0dc6de2a7b2'), f1: 5, f2: 5 },
        { _id: ObjectId('690db5c61189c0dc6de2a7b3'), f1: 9, f2: 9 },
        { _id: ObjectId('690db5c61189c0dc6de2a7b4'), f1: 7, f2: 7 },
        { _id: ObjectId('690db5c61189c0dc6de2a7b5'), f1: 5, f2: 5 },
        { _id: ObjectId('690db5c61189c0dc6de2a7b6'), f1: 9, f2: 9 },
        { _id: ObjectId('690db5c61189c0dc6de2a7b7'), f1: 6, f2: 6 },
        { _id: ObjectId('690db5c61189c0dc6de2a7b8'), f1: 7, f2: 7 },
        { _id: ObjectId('690db5c61189c0dc6de2a7b9'), f1: 0, f2: 0 },
        { _id: ObjectId('690db5c61189c0dc6de2a7ba'), f1: 9, f2: 9 },
        { _id: ObjectId('690db5c61189c0dc6de2a7bb'), f1: 3, f2: 3 },
        { _id: ObjectId('690db5c61189c0dc6de2a7bc'), f1: 1, f2: 1 },
        { _id: ObjectId('690db5c61189c0dc6de2a7bd'), f1: 3, f2: 3 },
        { _id: ObjectId('690db5c61189c0dc6de2a7be'), f1: 5, f2: 5 }
      ]
      

      as you can see, f1 and f2 have identical values throughout.

      timour.katchaounov@mongodb.com can you please opine as to how important that its with respect to CBR. We should definitely strive to fix this before any join ordering testing.

            Assignee:
            Unassigned
            Reporter:
            Philip Stoev
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated: