Uploaded image for project: 'Spark Connector'
  1. Spark Connector
  2. SPARK-410

col.isNotNull() does not work for fields with null values

    • Type: Icon: Bug Bug
    • Resolution: Fixed
    • Priority: Icon: Unknown Unknown
    • 10.2.1
    • Affects Version/s: 10.1.0, 10.1.1, 10.2.0
    • Component/s: API, Schema
    • Labels:
    • Not Needed
    • Hide

      1. What would you like to communicate to the user about this feature?
      2. Would you like the user to see examples of the syntax and/or executable code and its output?
      3. Which versions of the driver/connector does this apply to?

      Show
      1. What would you like to communicate to the user about this feature? 2. Would you like the user to see examples of the syntax and/or executable code and its output? 3. Which versions of the driver/connector does this apply to?

      How to reproduce:

      1. Create sample documents
        db.getSiblingDB("demo").getCollection("bugReproduction").insertMany([
            {
                "first_name": "Sample",
                "last_name": "Null User Id",
                "user_id": null
            },
            {
                "first_name": "Sample",
                "last_name": "Has User Id",
                "user_id": "12345"
            },
            {
                "first_name": "Sample",
                "last_name": "Unset User Id"
            }
        ]); 
      2. Attempt to filter the documents based on the `user_id` field
        Unable to find source-code formatter for language: kotlin. Available languages are: actionscript, ada, applescript, bash, c, c#, c++, cpp, css, erlang, go, groovy, haskell, html, java, javascript, js, json, lua, none, nyan, objc, perl, php, python, r, rainbow, ruby, scala, sh, sql, swift, visualbasic, xml, yaml
        // spark is a SparkContext object
        val df = spark
          .read()
          .format("mongodb")
          .option("spark.mongodb.connection.uri", "mongodb://localhost:27017")
          .option("spark.mongodb.database", "demo")
          .option("spark.mongodb.collection", "bugReproduction")
          .load()df.where(df.col("user_id").isNotNull()).show()

      Expected output: a single row containing the "Sample Has User Id" document

      Actual output: both "Sample Null User Id" and "Sample Has User Id" documents are included.

      In Spark Connector 10.0.5, this example works as expected.

            Assignee:
            ross@mongodb.com Ross Lawley
            Reporter:
            nstrong@securecodewarrior.com Nathan Strong
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: