Uploaded image for project: 'Documentation'
  1. Documentation
  2. DOCS-8770

[Spark] Add additional Python API examples

    • Type: Icon: Improvement Improvement
    • Resolution: Declined
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: Spark Connector
    • Labels:
      None
    • Environment:
      Spark Connector

      For https://docs.mongodb.com/spark-connector/python-api/

      It would help to add additional Python examples such as the following:

      • To read data from any database or collection, use a DataFrame and specify the database and/or collection in an option that overrides the default spark.mongodb.input.uri:
        dframe = sqlContext.read.format("com.mongodb.spark.sql.DefaultSource").option("spark.mongodb.input.uri", "mongodb://host:port/database.collection").load()
        dframe.printSchema()
        
      • Similarly, save data to any database or collection by using a DataFrame and overriding the default spark.mongodb.output.uri:
        dframe = sqlContext.createDataFrame(rdd)
        dframe.write.format("com.mongodb.spark.sql.DefaultSource").option("spark.mongodb.output.uri", "mongodb://host:port/database.collection").save()
        
      • An aggregation pipeline can be specified as an option when reading data:
        pipeline = [{'$match': {'fieldA': 1}}]
        df = sqlContext.read.format("com.mongodb.spark.sql.DefaultSource").option("pipeline", pipeline).load()
        

            Assignee:
            Unassigned Unassigned
            Reporter:
            alan.mccoy Roger McCoy (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved:
              2 years, 47 weeks, 2 days ago