Uploaded image for project: 'Spark Connector'
  1. Spark Connector
  2. SPARK-292

Scala.Matcherror when trying to load dataframe in PySpark

    • Type: Icon: Bug Bug
    • Resolution: Gone away
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: 3.0.1
    • Component/s: Schema
    • Labels:
      None
    • Environment:
      Spark 3.0.1, Scala 2.12.12, Mongo-Spark Connector : 3.0.1

      Following is the exception thrown when I try to load a data frame from Mongo DB using PySpark, 

      Spark 3.0.2, Scala 2.12.12, Mongo-Spark Connector : 3.0.1

      mongoDF = spark.read.format("mongo")\
      .option("uri",mongoURI)\
      .load()
      

      I am able to read some of the collections in the DB properly but for one the collection following is the exception thrown

       

      An error occurred while calling o49.load.
      : scala.MatchError: ConflictType (of class com.mongodb.spark.sql.types.ConflictType$)
       at org.apache.spark.sql.catalyst.encoders.RowEncoder$.externalDataTypeFor(RowEncoder.scala:215)
       at org.apache.spark.sql.catalyst.encoders.RowEncoder$.externalDataTypeForInput(RowEncoder.scala:212)
       at org.apache.spark.sql.catalyst.expressions.objects.ValidateExternalType.<init>(objects.scala:1699)
       at org.apache.spark.sql.catalyst.encoders.RowEncoder$.$anonfun$serializerFor$3(RowEncoder.scala:175)
       at scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:245)
       at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
       at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
       at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:198)
       at scala.collection.TraversableLike.flatMap(TraversableLike.scala:245)
       at scala.collection.TraversableLike.flatMap$(TraversableLike.scala:242)
       at scala.collection.mutable.ArrayOps$ofRef.flatMap(ArrayOps.scala:198)
       at org.apache.spark.sql.catalyst.encoders.RowEncoder$.serializerFor(RowEncoder.scala:171)
       at org.apache.spark.sql.catalyst.encoders.RowEncoder$.$anonfun$serializerFor$3(RowEncoder.scala:176)
       at scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:245)
       at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
       at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
       at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:198)
       at scala.collection.TraversableLike.flatMap(TraversableLike.scala:245)
       at scala.collection.TraversableLike.flatMap$(TraversableLike.scala:242)
       at scala.collection.mutable.ArrayOps$ofRef.flatMap(ArrayOps.scala:198)
       at org.apache.spark.sql.catalyst.encoders.RowEncoder$.serializerFor(RowEncoder.scala:171)
       at org.apache.spark.sql.catalyst.encoders.RowEncoder$.$anonfun$serializerFor$2(RowEncoder.scala:136)
       at org.apache.spark.sql.catalyst.expressions.objects.MapObjects$.apply(objects.scala:689)
       at org.apache.spark.sql.catalyst.SerializerBuildHelper$.createSerializerForMapObjects(SerializerBuildHelper.scala:155)
       at org.apache.spark.sql.catalyst.encoders.RowEncoder$.serializerFor(RowEncoder.scala:135)
       at org.apache.spark.sql.catalyst.encoders.RowEncoder$.serializerFor(RowEncoder.scala:156)
       at org.apache.spark.sql.catalyst.encoders.RowEncoder$.$anonfun$serializerFor$3(RowEncoder.scala:176)
       at scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:245)
       at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
       at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
       at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:198)
       at scala.collection.TraversableLike.flatMap(TraversableLike.scala:245)
       at scala.collection.TraversableLike.flatMap$(TraversableLike.scala:242)
       at scala.collection.mutable.ArrayOps$ofRef.flatMap(ArrayOps.scala:198)
       at org.apache.spark.sql.catalyst.encoders.RowEncoder$.serializerFor(RowEncoder.scala:171)
       at org.apache.spark.sql.catalyst.encoders.RowEncoder$.$anonfun$serializerFor$3(RowEncoder.scala:176)
       at scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:245)
       at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
       at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
       at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:198)
       at scala.collection.TraversableLike.flatMap(TraversableLike.scala:245)
       at scala.collection.TraversableLike.flatMap$(TraversableLike.scala:242)
       at scala.collection.mutable.ArrayOps$ofRef.flatMap(ArrayOps.scala:198)
       at org.apache.spark.sql.catalyst.encoders.RowEncoder$.serializerFor(RowEncoder.scala:171)
       at org.apache.spark.sql.catalyst.encoders.RowEncoder$.apply(RowEncoder.scala:66)
       at org.apache.spark.sql.Dataset$.$anonfun$ofRows$1(Dataset.scala:92)
       at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:764)
       at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:89)
       at org.apache.spark.sql.SparkSession.baseRelationToDataFrame(SparkSession.scala:428)
       at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:297)
       at org.apache.spark.sql.DataFrameReader.$anonfun$load$2(DataFrameReader.scala:286)
       at scala.Option.getOrElse(Option.scala:189)
       at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:286)
       at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:221)
       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
       at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
       at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
       at java.lang.reflect.Method.invoke(Method.java:498)
       at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
       at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
       at py4j.Gateway.invoke(Gateway.java:282)
       at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
       at py4j.commands.CallCommand.execute(CallCommand.java:79)
       at py4j.GatewayConnection.run(GatewayConnection.java:238)
       at java.lang.Thread.run(Thread.java:748)
      

            Assignee:
            ross@mongodb.com Ross Lawley
            Reporter:
            gurunandan.ug@gmail.com Gurunandan UG
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: