Uploaded image for project: 'Spark Connector'
  1. Spark Connector
  2. SPARK-218

MongoSamplePartitioner is slow to count documents when explicitly inNullable fields are defined

    • Type: Icon: Improvement Improvement
    • Resolution: Fixed
    • Priority: Icon: Minor - P4 Minor - P4
    • 2.4.1, 2.3.3, 2.2.7, 2.1.6
    • Affects Version/s: 2.3.0
    • Component/s: Partitioners
    • Labels:
      None

      Since fields that explicitly aren't nullable are added to the filters (for pruning), the MongoSamplePartitioner needs to perform a full scan to count the filtered rows.

      In some cases, it is very slow, especially on large data set. I notice that mongo-connector 2.0.0 uses an inaccurate count instead of performing a count, so that the problem does not exists.

      Is it possible to add some warning information or document somewhere to prevent someone from misusing the InNullable schema qualifier.

            Assignee:
            ross@mongodb.com Ross Lawley
            Reporter:
            onesuper Cheng Yichao
            Votes:
            1 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: