-
Type: Bug
-
Resolution: Duplicate
-
Priority: Unknown
-
None
-
Affects Version/s: 10.1.0, 10.1.1
-
Component/s: None
-
None
Hello
When the `convertJson` option for writing is set to true, data of string type that is made of only numbers fails when string is large.
Here is a minimal reproducible example
from pyspark.sql import SparkSession from pyspark.sql.types import * schema = StructType([StructField("msisdn", StringType())]) spark = SparkSession.builder.master("local[*]").appName("pySpark") \ .config('spark.jars.packages', 'org.mongodb.spark:mongo-spark-connector_2.12:10.1.0') \ .config('spark.mongodb.read.connection.uri', 'mongodb://127.0.0.1:27017/') \ .config('spark.mongodb.write.connection.uri', 'mongodb://127.0.0.1:27017/') \ .config('spark.mongodb.write.database' ,'projectdb') \ .config('spark.mongodb.write.collection', 'cc') \ .config('spark.mongodb.write.convertJson', True) \ .getOrCreate() data = [["0140800121751"], ["12345678901234567890"]] df = spark.createDataFrame(data = data, schema = schema) df.printSchema() df.show(3,False) df.write.format("mongodb").mode("append").save()
Here is the output with error
root |-- msisdn: string (nullable = true) +--------------------+ |msisdn | +--------------------+ |0140800121751 | |12345678901234567890| +--------------------+ Caused by: com.mongodb.spark.sql.connector.exceptions.DataException: Cannot cast [12345678901234567890] into a BsonValue. StructType(StructField(msisdn,StringType,true)) has no matching BsonValue. Error: Cannot cast 12345678901234567890 into a BsonValue. StringType has no matching BsonValue. Error: For input string: "12345678901234567890" at com.mongodb.spark.sql.connector.schema.RowToBsonDocumentConverter.toBsonValue(RowToBsonDocumentConverter.java:191) at com.mongodb.spark.sql.connector.schema.RowToBsonDocumentConverter.fromRow(RowToBsonDocumentConverter.java:106) at com.mongodb.spark.sql.connector.schema.RowToBsonDocumentConverter.fromRow(RowToBsonDocumentConverter.java:92)
I'm using scala 2.12, spark 3.4.0 and open-jdk 11. The errror doesn't come for the first value but for larger length second value. Note, that the error only comes when `convertJson` is set to true otherwise it runs fine