Uploaded image for project: 'Spark Connector'
  1. Spark Connector
  2. SPARK-348

Can't read array field with null value from mongodb

    • Type: Icon: Bug Bug
    • Resolution: Fixed
    • Priority: Icon: Unknown Unknown
    • 10.0.2
    • Affects Version/s: None
    • Component/s: None
    • Labels:
      None

      What did I use

      • Databricks Runtime Version 10.4 LTS (includes Apache Spark 3.2.1, Scala 2.12)
        • org.mongodb.spark:mongo-spark-connector:10.0.1
      • MongoDB 5.0

      What did I do

      I tried to load the following value from mongodb to databricks

      [{
        "_id": {
          "$oid": "6289e26430540f2e5db55f3c"
        },
        "username": "tmp_user_3",
        "attributes": []
      },{
        "_id": {
          "$oid": "6289e26430540f2e5db55f3f"
        },
        "username": "tmp_user_4",
        "attributes": null
      },{
        "_id": {
          "$oid": "6289e26430540f2e5db55f3d"
        },
        "username": "tmp_user_2",
        "attributes": [
          {
            "key": "c",
            "value": 3
          }
        ]
      },{
        "_id": {
          "$oid": "6289e26430540f2e5db55f3e"
        },
        "username": "tmp_user_1",
        "attributes": [
          {
            "key": "a",
            "value": 1
          },
          {
            "key": "b",
            "value": 2
          }
        ]
      }]
      
      (
        spark
        .read
        .format("mongodb")
        .option("database", database)
        .option("collection", collection)
        .option("connection.uri", connection_uri)
        .load()
        .display()
      )
      

      What did I get

      the data can't be read from mongodb

      org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 79.0 failed 1 times, most recent failure: Lost task 0.0 in stage 79.0 (TID 304) (ip-10-172-164-192.us-west-2.compute.internal executor driver): com.mongodb.spark.sql.connector.exceptions.DataException: Invalid field: 'attributes'. The dataType 'array' is invalid for 'BsonNull'.

      What do I expect

      The dataframe is displayed

            Assignee:
            ross@mongodb.com Ross Lawley
            Reporter:
            me@kytse.com Kit Yam Tse
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: