Uploaded image for project: 'Spark Connector'
  1. Spark Connector
  2. SPARK-220

MongoSpark Connector Not honoured

    • Type: Icon: Bug Bug
    • Resolution: Fixed
    • Priority: Icon: Major - P3 Major - P3
    • 2.4.0, 2.3.2, 2.2.6, 2.1.5
    • Affects Version/s: 2.3.1
    • Component/s: Configuration
    • Labels:
      None
    • Environment:
      Mongo 3.6, Spark 2.3.1

      Loading a DataFrame with an externally configured MongoClientFactory still uses the DefaultMongoClientFactory for parts of the operations e.g.

      val ms = MongoSpark.builder()
                .sparkSession(sparkSession)
                .connector(new MongoConnector(ExternalMongoClientFactory))
                .readConfig(SomeReadConfigIncludingTheURI)
                .build

      seem to use ExternalMongoClientFactory for schema inferment (as expected) but on an actual load with .toDf the internals seem to setup a new mongo connector with new MongoConnector(DefaultMongoClientFactory(options)) (not expected). The DefaultMongoClientFactory seem not to support all the MongoClient options.

      This becomes truly apparent when using an ExternalMongoClientFactory that generates MongoClients with interface specific settings such as an socketFactory setting up TLS with a mongo-spark shared PKI/CA.

      The expected behaviour is to be able to use all the options of the MongoClient also in MongoSpark.

       

            Assignee:
            ross@mongodb.com Ross Lawley
            Reporter:
            Fredrik Fredrik Ahlqvist [X]
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: