Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Done
Priority: Critical - P2
Fix Version/s: None
Affects Version/s: 2.0.0
Component/s: API
Labels:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Link:
None
Goal Name(s):
None

Selecting parts of documents in collections, then working and updating those parts, and storing everthing back to MongoDB does not work, as all fields that are removed by the the select statements will be deleted from the DB:

Assume a collection with "coll" where each document has two fields "f1" and "f2" and the "_id", which was loaded into a dataframe using the connector:

df = sparkSession.read().format("com.mongodb.spark.sql").load();

Then, then, only "f1" is selected for this dataframe, due to performance reasons:

dataframe.select("_id", "f1")

Afterwards, some algorithm creates a new column "f3"

dataframe.withColumn("f3", ...)

The writing will now delete all "f2" fields from the documents:

dataset.write().format("com.mongodb.spark.sql").options(options).mode(SaveMode.Append).save();

Thus, adding the new field to the database this way is not possible and instead dataloss is likely, even though the SaveMode should append only.

Assignee:: Unassigned
Reporter:: Steffen Herbold
Reviewers:: None
Votes:: 0 Vote for this issue
Watchers:: 2 Start watching this issue

Created:: Nov 17 2016 03:35:26 PM UTC
Updated:: Nov 18 2016 08:39:52 AM UTC
Resolved:: Nov 17 2016 04:27:20 PM UTC

Details

Description

Attachments

Activity

People

Dates