Uploaded image for project: 'Spark Connector'
  1. Spark Connector
  2. SPARK-351

array field with null value is not written to mongodb

    • Type: Icon: Improvement Improvement
    • Resolution: Fixed
    • Priority: Icon: Unknown Unknown
    • 10.1.0
    • Affects Version/s: None
    • Component/s: None
    • Labels:
      None

      What did I use

      • Databricks Runtime Version 10.4 LTS (includes Apache Spark 3.2.1, Scala 2.12)
        • org.mongodb.spark:mongo-spark-connector:10.0.1
      • MongoDB 5.0

      What did I do

      from pyspark.sql.types import *
      
      data = [
        ["tmp_user_1", [{"key": "a", "value": 1}, {"key": "b", "value": 2}]],
        ["tmp_user_2", [{"key": "c", "value": 3}]],
        ["tmp_user_3", []],
        ["tmp_user_4", None],
      ] 
      schema = StructType([
        StructField("username", StringType()),
        StructField("attributes",ArrayType(StructType([
          StructField("key", StringType()),
          StructField("value", IntegerType()), 
        ]))),
      ])
      
      df = spark.createDataFrame(data, schema)
      
      print(df.schema)
      df.display()
      
      (
        df
        .write
        .format("mongodb")
        .option("database", database)
        .option("collection", collection)
        .option("connection.uri", connection_uri)
        .mode("overwrite")
        .save()
      )
      

      What did I see

      When I query the collection, I found that the attributes field for tmp_user_4 is missing

      [{
        "_id": {
          "$oid": "628d00ccfce1094fa5465686"
        },
        "username": "tmp_user_4"
      },{
        "_id": {
          "$oid": "628d00ccfce1094fa5465689"
        },
        "username": "tmp_user_1",
        "attributes": [
          {
            "key": "a",
            "value": 1
          },
          {
            "key": "b",
            "value": 2
          }
        ]
      },{
        "_id": {
          "$oid": "628d00ccfce1094fa5465687"
        },
        "username": "tmp_user_2",
        "attributes": [
          {
            "key": "c",
            "value": 3
          }
        ]
      },{
        "_id": {
          "$oid": "628d00ccfce1094fa5465688"
        },
        "username": "tmp_user_3",
        "attributes": []
      }]
      

      What do I expect

      attributes field for tmp_user_4 should be in collection with null value.

            Assignee:
            ross@mongodb.com Ross Lawley
            Reporter:
            me@kytse.com Kit Yam Tse
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: