-
Type: Bug
-
Resolution: Done
-
Priority: Critical - P2
-
None
-
Affects Version/s: 2.0.0
-
Component/s: API
-
None
Selecting parts of documents in collections, then working and updating those parts, and storing everthing back to MongoDB does not work, as all fields that are removed by the the select statements will be deleted from the DB:
Assume a collection with "coll" where each document has two fields "f1" and "f2" and the "_id", which was loaded into a dataframe using the connector:
df = sparkSession.read().format("com.mongodb.spark.sql").load();
Then, then, only "f1" is selected for this dataframe, due to performance reasons:
dataframe.select("_id", "f1")
Afterwards, some algorithm creates a new column "f3"
dataframe.withColumn("f3", ...)
The writing will now delete all "f2" fields from the documents:
dataset.write().format("com.mongodb.spark.sql").options(options).mode(SaveMode.Append).save();
Thus, adding the new field to the database this way is not possible and instead dataloss is likely, even though the SaveMode should append only.