Uploaded image for project: 'Spark Connector'
  1. Spark Connector
  2. SPARK-99

Fetch-select-update-write with Dataframes with SafeMode.Append and _id replaces the whole document

    • Type: Icon: Bug Bug
    • Resolution: Done
    • Priority: Icon: Critical - P2 Critical - P2
    • None
    • Affects Version/s: 2.0.0
    • Component/s: API
    • None

      Selecting parts of documents in collections, then working and updating those parts, and storing everthing back to MongoDB does not work, as all fields that are removed by the the select statements will be deleted from the DB:

      Assume a collection with "coll" where each document has two fields "f1" and "f2" and the "_id", which was loaded into a dataframe using the connector:

      df = sparkSession.read().format("com.mongodb.spark.sql").load();
      

      Then, then, only "f1" is selected for this dataframe, due to performance reasons:

      dataframe.select("_id", "f1")
      

      Afterwards, some algorithm creates a new column "f3"

      dataframe.withColumn("f3", ...) 
      

      The writing will now delete all "f2" fields from the documents:

      dataset.write().format("com.mongodb.spark.sql").options(options).mode(SaveMode.Append).save();
      

      Thus, adding the new field to the database this way is not possible and instead dataloss is likely, even though the SaveMode should append only.

            Assignee:
            Unassigned Unassigned
            Reporter:
            sherbold Steffen Herbold
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: