Fault in pre-image sampling arithmetic

XMLWordPrintableJSON

    • Type: Bug
    • Resolution: Fixed
    • Priority: Major - P3
    • 8.1.0-rc0
    • Affects Version/s: None
    • Component/s: None
    • None
    • Storage Execution
    • Fully Compatible
    • ALL
    • Execution Team 2024-11-25
    • 200
    • None
    • 3
    • None
    • None
    • None
    • None
    • None
    • None

      The code invariants
      randomSamplesPerMarker <= static_cast<uint64_t>(estimatedRecordsPerMarker)

      randomSamplesPerMarker is a constant set to 10, whereas the other value is computed as follows

        double avgRecordSize = double(dataSize) / double(numRecords);
        double estimatedRecordsPerMarker = std::ceil(minBytesPerMarker / avgRecordSize); 
      

      However, this does not hold if there is one very large record.

      For example, suppose the numRecords reported is 1, and dataSize is reported as 16777328 bytes. With minBytesPerMarker set as 33_554_432  # 32 MiB by default,

      • avgRecordSize = 16777328
      • estimatedRecordsPerMarker = 2
      • randomSamplesPerMarker = 10

      This defies the invariant that randomSamplesPerMarker <= estimatedRecordsPerMarker

            Assignee:
            Haley Connelly
            Reporter:
            Haley Connelly
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: