Uploaded image for project: 'Spark Connector'
  1. Spark Connector
  2. SPARK-386

Reading dates from MongoDB

    • Type: Icon: Improvement Improvement
    • Resolution: Unresolved
    • Priority: Icon: Minor - P4 Minor - P4
    • 10.3.0
    • Affects Version/s: None
    • Component/s: None
    • Labels:
      None

      Overview

      We're having two issues with the Mongo-spark-connector's handling of dates which are related:
      1. Writing and then reading an object with a java.sql.Date field results in org.apache.spark.sql.AnalysisException: Cannot up cast <field name> from "TIMESTAMP" to "DATE". I would consider this a bug because you can't read data you just wrote.
      2. Reading documents with dates (meaning MongoDB dates, which consist of date, time and time zone) while enabling the Java 8 time API in Spark fails with java.sql.Timestamp is not a valid external type for schema of date. I request mongo-spark-connector supporting this feature flag.

      See attached files for complete code samples to reproduce these issues.

      Details

      In SPARK-340 a change has been introduced that converts MongoDB dates to java.sql.Timestamp while converting the Mongo documents to Spark SQL rows.
      Spark is then unable to encode this into a date.

      In our team we would prefer to ditch the date and time API from java.sql altogether. Spark has also made efforts to support this. Internally they use the "modern" java.time API (some background). With the configuration parameter spark.sql.datetime.java8API.enabled they also enable serialization and deserialization of java.time.* fields.

      The attached sbt project contains two runnable programs to reproduce the described issues.

            Assignee:
            ross@mongodb.com Ross Lawley
            Reporter:
            hannes.bibel@gmail.com Hannes Bibel
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated: