Stream Reader does not Work with Spark/Databricks Connect - Mongo Spark Connector Version 11

XMLWordPrintableJSON

    • Type: Bug
    • Resolution: Unresolved
    • Priority: Unknown
    • None
    • Affects Version/s: 11.0.0
    • Component/s: Reads, Stream
    • None
    • Java Drivers
    • None
    • None
    • None
    • None
    • None
    • None

      Version 11 of the Mongo Spark Connector breaks when reading from a stream with Spark Connect—looks like a serialization issue. I've tested this using pyspark 4.0 and 4.1 + Databricks connect DBR 17.2, DBR 17.3, and DBR 18.0. Here's an example of the error:
       

      pyspark.errors.exceptions.connect.SparkException: Job aborted due to stage failure: Task 5 in stage 122.0 failed 4 times, most recent failure: Lost task 5.3 in stage 122.0 (TID 480) (172.21.12.249 executor 0): java.lang.ClassCastException: cannot assign instance of scala.collection.generic.DefaultSerializationProxy to field  

      (full stack trace attached)

            Assignee:
            Unassigned
            Reporter:
            David Belais
            None
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated: