-
Type:
Improvement
-
Resolution: Fixed
-
Priority:
Unknown
-
Affects Version/s: None
-
Component/s: None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
What did I use
- Databricks Runtime Version 10.4 LTS (includes Apache Spark 3.2.1, Scala 2.12)
- org.mongodb.spark:mongo-spark-connector:10.0.1
- MongoDB 5.0
What did I do
from pyspark.sql.types import * data = [ ["tmp_user_1", [{"key": "a", "value": 1}, {"key": "b", "value": 2}]], ["tmp_user_2", [{"key": "c", "value": 3}]], ["tmp_user_3", []], ["tmp_user_4", None], ] schema = StructType([ StructField("username", StringType()), StructField("attributes",ArrayType(StructType([ StructField("key", StringType()), StructField("value", IntegerType()), ]))), ]) df = spark.createDataFrame(data, schema) print(df.schema) df.display() ( df .write .format("mongodb") .option("database", database) .option("collection", collection) .option("connection.uri", connection_uri) .mode("overwrite") .save() )
What did I see
When I query the collection, I found that the attributes field for tmp_user_4 is missing
[{
"_id": {
"$oid": "628d00ccfce1094fa5465686"
},
"username": "tmp_user_4"
},{
"_id": {
"$oid": "628d00ccfce1094fa5465689"
},
"username": "tmp_user_1",
"attributes": [
{
"key": "a",
"value": 1
},
{
"key": "b",
"value": 2
}
]
},{
"_id": {
"$oid": "628d00ccfce1094fa5465687"
},
"username": "tmp_user_2",
"attributes": [
{
"key": "c",
"value": 3
}
]
},{
"_id": {
"$oid": "628d00ccfce1094fa5465688"
},
"username": "tmp_user_3",
"attributes": []
}]
What do I expect
attributes field for tmp_user_4 should be in collection with null value.