Uploaded image for project: 'Spark Connector'
  1. Spark Connector
  2. SPARK-245

Allow user to define explicit schema with StructType

    • Type: Icon: Improvement Improvement
    • Resolution: Won't Fix
    • Priority: Icon: Critical - P2 Critical - P2
    • None
    • Affects Version/s: None
    • Component/s: None
    • Labels:
      None

      As mentioned in connector document to specific Schema of dataframe and avoid inference, We have to provide case class. But it some usage cases (same as mine) it not feasible and I prefer to infer schema with high rate of samples for the first time, then cache it and use cached schema.

      For that I should use following way in my source code which is not clear enough and some how it should be away from my eyes.

       

      builder().sparkSession(session).readConfig(readConfig).build().toDF(schema) 
      

      I would like to have it simple as possible, somehow like 

       

      session.loadFromMongoDB(readConfig, schmea) 
      

       If you think it's good way, I can create pull request for it.

            Assignee:
            ross@mongodb.com Ross Lawley
            Reporter:
            moein7tl Seyed Mohammad Moein Hosseini Manesh
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: