-
Type: Improvement
-
Resolution: Fixed
-
Priority: Unknown
-
Affects Version/s: None
-
Component/s: None
-
None
What did I use
- Databricks Runtime Version 10.4 LTS (includes Apache Spark 3.2.1, Scala 2.12)
- org.mongodb.spark:mongo-spark-connector:10.0.1
- MongoDB 5.0
What did I do
from pyspark.sql.types import * data = [ ["tmp_user_1", [{"key": "a", "value": 1}, {"key": "b", "value": 2}]], ["tmp_user_2", [{"key": "c", "value": 3}]], ["tmp_user_3", []], ["tmp_user_4", None], ] schema = StructType([ StructField("username", StringType()), StructField("attributes",ArrayType(StructType([ StructField("key", StringType()), StructField("value", IntegerType()), ]))), ]) df = spark.createDataFrame(data, schema) print(df.schema) df.display() ( df .write .format("mongodb") .option("database", database) .option("collection", collection) .option("connection.uri", connection_uri) .mode("overwrite") .save() )
What did I see
When I query the collection, I found that the attributes field for tmp_user_4 is missing
[{ "_id": { "$oid": "628d00ccfce1094fa5465686" }, "username": "tmp_user_4" },{ "_id": { "$oid": "628d00ccfce1094fa5465689" }, "username": "tmp_user_1", "attributes": [ { "key": "a", "value": 1 }, { "key": "b", "value": 2 } ] },{ "_id": { "$oid": "628d00ccfce1094fa5465687" }, "username": "tmp_user_2", "attributes": [ { "key": "c", "value": 3 } ] },{ "_id": { "$oid": "628d00ccfce1094fa5465688" }, "username": "tmp_user_3", "attributes": [] }]
What do I expect
attributes field for tmp_user_4 should be in collection with null value.