Uploaded image for project: 'Spark Connector'
  1. Spark Connector
  2. SPARK-287

Spark Connector 3.0.0 Java API for Dataset save doesn't recognize host address "host.docker.internal"

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Done
    • Affects Version/s: 3.0.0
    • Fix Version/s: None
    • Component/s: Configuration
    • Labels:
      None
    • Environment:
      Spark Executor Node runs on a docker container in a seprate bridge network, and the connector try to call the mongodb host address "host.docker.internal"

      Description

      Calling the MongoSpark API 

      ```java
      {{ MongoSpark.write(resultsDataset).option("collection", "maycollection").option("replaceDocument", "false").mode(SaveMode.Overwrite).save();}}
      {{ ```}}
      {{ to save a Dataset/DataFrame in Java with OpenJDK11 to mongodb, i got an error from Spark connector Address "host.docker.internal".}}

      Since the Spark Workers/Executor runs in a docker container, the mongoDB also runs in a seperate docker container, it is commend approach on MacOSX or Windows to call the "host.docker.internal" to resolve to the real ip of docker host. There might be some issues here in the sanity check for the mongodb host address in this API.

      The strange thing is, it only happens wenn i store a Dataset. While i am calling the mongodb with address "host.docker.internal" to save a RDD using WriteConfig, MongoSpark.save(RDD<?>, ...)

      ```java

      WriteConfig writeConfig = genWriteConfig(jsc, collectionSb);
      MongoSpark.save(resultsRDD, writeConfig);

      ```
      The aforementioned error didn't happen. There might be some inconsistent behaviour to save the RDD and Dataset. Is it the reason the RDD and Dataset Function save() calls different configuration and has different sanity check on the mongodb addresse.

      My last test was using the IP address of docker host to replace the "host.docker.internal" string as mongodb host address and call

      again to save a Dataset in Java with OpenJDK11

      ```java
      {{ MongoSpark.write(resultsDataset).option("collection", "maycollection").option("replaceDocument", "false").mode(SaveMode.Overwrite).save();}}
      {{ ```}}

      The save() function works fine.

      I really hope this issue can be addressed and resolved, since calling ip address "}}host.docker.internal{{" is very common inside a docker container to reach the docker host, the unix way would be to call the gateway ip.

       

       

       

       

       

       

        Attachments

          Activity

            People

            Assignee:
            ross.lawley Ross Lawley
            Reporter:
            wang@pms.ifi.lmu.de Yingding Wang
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Dates

              Created:
              Updated:
              Resolved: