Uploaded image for project: 'Documentation'
  1. Documentation
  2. DOCS-8770

[Spark] Add additional Python API examples

    XMLWordPrintableJSON

Details

    • Icon: Improvement Improvement
    • Resolution: Declined
    • Icon: Major - P3 Major - P3
    • None
    • None
    • Spark Connector
    • None
    • Spark Connector

    Description

      For https://docs.mongodb.com/spark-connector/python-api/

      It would help to add additional Python examples such as the following:

      • To read data from any database or collection, use a DataFrame and specify the database and/or collection in an option that overrides the default spark.mongodb.input.uri:

        dframe = sqlContext.read.format("com.mongodb.spark.sql.DefaultSource").option("spark.mongodb.input.uri", "mongodb://host:port/database.collection").load()
        dframe.printSchema()
        

      • Similarly, save data to any database or collection by using a DataFrame and overriding the default spark.mongodb.output.uri:

        dframe = sqlContext.createDataFrame(rdd)
        dframe.write.format("com.mongodb.spark.sql.DefaultSource").option("spark.mongodb.output.uri", "mongodb://host:port/database.collection").save()
        

      • An aggregation pipeline can be specified as an option when reading data:

        pipeline = [{'$match': {'fieldA': 1}}]
        df = sqlContext.read.format("com.mongodb.spark.sql.DefaultSource").option("pipeline", pipeline).load()
        

      Attachments

        Activity

          People

            Unassigned Unassigned
            alan.mccoy Roger McCoy (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:
              2 years, 37 weeks, 1 day ago