Uploaded image for project: 'Spark Connector'
  1. Spark Connector
  2. SPARK-384

Saving a dataframe that contains WrappedArray with nulls throws a DataException

    • Type: Icon: Bug Bug
    • Resolution: Fixed
    • Priority: Icon: Unknown Unknown
    • 10.1.0
    • Affects Version/s: None
    • Component/s: None
    • None
    • Needed
    • Hide

      A side effect of the fix for this ticket is that any fields with `null` values are stored in MongoDB. This means top level fields with `null`values and lists / maps containing `null` values behave the same.

      The benefit of this is now data can fully round trip when using the connector.

      Note: In 10.0.x null fields were automatically excluded and null values in lists / maps threw an error.

      Show
      A side effect of the fix for this ticket is that any fields with `null` values are stored in MongoDB. This means top level fields with `null`values and lists / maps containing `null` values behave the same. The benefit of this is now data can fully round trip when using the connector. Note: In 10.0.x null fields were automatically excluded and null values in lists / maps threw an error.

      When i try to save a dataframe that contains a WrappedArray with a null value inside it to MongoDB, it throws an exception of type com.mongodb.spark.sql.connector.exceptions.DataException. The full error message is the following

       

      Cannot cast [[WrappedArray(null)],63aacee6e3fe179cabd46a30] into a BsonValue. StructType(StructField(EnergyMarket,StructType(StructField(balancingRequirements,ArrayType(StringType,true),true)),true),StructField(_id,StringType,true)) has no matching BsonValue. Error: Cannot cast [WrappedArray(null)] into a BsonValue. StructType(StructField(balancingRequirements,ArrayType(StringType,true),true)) has no matching BsonValue. Error: Cannot cast WrappedArray(null) into a BsonValue. ArrayType(StringType,true) has no matching BsonValue. Error: Cannot cast null into a BsonValue. StringType has no matching BsonValue. Error: Value can not be null
      

       

      The schema of the dataframe is

       

       |-- EnergyMarket: struct (nullable = true)
       |    |-- balancingRequirements: array (nullable = true)
       |    |    |-- element: string (containsNull = true)
       |-- _id: string (nullable = true)
      

       

      A sample of the actual dataframe's data

       

      +------------+--------------------+                                             
      |EnergyMarket|                 _id|
      +------------+--------------------+
      |    {[null]}|63aacee6e3fe179ca...|
      |    {[test]}|63aacee6e3fe179ca...|
      +------------+--------------------+
      

       

      The same data in json format

       

      [
        {
          "_id": {
            "$oid": "63aacee6e3fe179cabd46a30"
          },
          "EnergyMarket": {
            "balancingRequirements": [
              null
            ]
          }
        },
        {
          "_id": {
            "$oid": "63aacee6e3fe179cabd46a31"
          },
          "EnergyMarket": {
            "balancingRequirements": [
              "test"
            ]
          }
        }
      ]
      

       

      Finally, I would like to mention that I am using the version 10.0.5 of Spark Connector. This problem did not exist with the previous version (3.0.1).

            Assignee:
            ross@mongodb.com Ross Lawley
            Reporter:
            georgios.bikas@gmail.com George Bikas
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated:
              Resolved: