Uploaded image for project: 'Spark Connector'
  1. Spark Connector
  2. SPARK-403

Error on storing large strings of only numbers

    • Type: Icon: Bug Bug
    • Resolution: Duplicate
    • Priority: Icon: Unknown Unknown
    • None
    • Affects Version/s: 10.1.0, 10.1.1
    • Component/s: None
    • Labels:
      None
    • Hide

      1. What would you like to communicate to the user about this feature?
      2. Would you like the user to see examples of the syntax and/or executable code and its output?
      3. Which versions of the driver/connector does this apply to?

      Show
      1. What would you like to communicate to the user about this feature? 2. Would you like the user to see examples of the syntax and/or executable code and its output? 3. Which versions of the driver/connector does this apply to?

      Hello

      When the `convertJson` option for writing is set to true, data of string type that is made of only numbers fails when string is large.

       

      Here is a minimal reproducible example

      from pyspark.sql import SparkSession
      from pyspark.sql.types import *
      
      schema = StructType([StructField("msisdn", StringType())])
      
      spark = SparkSession.builder.master("local[*]").appName("pySpark") \                    
                .config('spark.jars.packages', 'org.mongodb.spark:mongo-spark-connector_2.12:10.1.0') \                          
                .config('spark.mongodb.read.connection.uri', 'mongodb://127.0.0.1:27017/') \                    
                .config('spark.mongodb.write.connection.uri', 'mongodb://127.0.0.1:27017/') \                    
                 .config('spark.mongodb.write.database' ,'projectdb') \                    
                 .config('spark.mongodb.write.collection', 'cc') \                    
                 .config('spark.mongodb.write.convertJson', True) \                    
                 .getOrCreate()
      
      data = [["0140800121751"], ["12345678901234567890"]]
      df = spark.createDataFrame(data = data, schema = schema)
      
      
      df.printSchema()
      df.show(3,False)
      df.write.format("mongodb").mode("append").save()

       

      Here is the output with error

       

      root
       |-- msisdn: string (nullable = true)
      
      
      +--------------------+                                                          
      |msisdn              |
      +--------------------+
      |0140800121751       |
      |12345678901234567890|
      +--------------------+
      
      Caused by: com.mongodb.spark.sql.connector.exceptions.DataException: Cannot cast [12345678901234567890] into a BsonValue. StructType(StructField(msisdn,StringType,true)) has no matching BsonValue. Error: Cannot cast 12345678901234567890 into a BsonValue. StringType has no matching BsonValue. Error: For input string: "12345678901234567890"
      	at com.mongodb.spark.sql.connector.schema.RowToBsonDocumentConverter.toBsonValue(RowToBsonDocumentConverter.java:191)
      	at com.mongodb.spark.sql.connector.schema.RowToBsonDocumentConverter.fromRow(RowToBsonDocumentConverter.java:106)
      	at com.mongodb.spark.sql.connector.schema.RowToBsonDocumentConverter.fromRow(RowToBsonDocumentConverter.java:92) 

       

      I'm using scala 2.12, spark 3.4.0 and open-jdk 11. The errror doesn't come for the first value but for larger length second value. Note, that the error only comes when `convertJson` is set to true otherwise it runs fine

       

            Assignee:
            ross@mongodb.com Ross Lawley
            Reporter:
            dvsingla.28@gmail.com Dhruv S
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: