[DOCS-8770] [Spark] Add additional Python API examples Created: 06/Sep/16  Updated: 25/Aug/23  Resolved: 25/May/21

Status: Closed
Project: Documentation
Component/s: Spark Connector
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Roger McCoy (Inactive) Assignee: Unassigned
Resolution: Declined Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Spark Connector


Participants:
Days since reply: 2 years, 37 weeks, 1 day ago
Epic Link: DOCSP-6205

 Description   

For https://docs.mongodb.com/spark-connector/python-api/

It would help to add additional Python examples such as the following:

  • To read data from any database or collection, use a DataFrame and specify the database and/or collection in an option that overrides the default spark.mongodb.input.uri:

    dframe = sqlContext.read.format("com.mongodb.spark.sql.DefaultSource").option("spark.mongodb.input.uri", "mongodb://host:port/database.collection").load()
    dframe.printSchema()
    

  • Similarly, save data to any database or collection by using a DataFrame and overriding the default spark.mongodb.output.uri:

    dframe = sqlContext.createDataFrame(rdd)
    dframe.write.format("com.mongodb.spark.sql.DefaultSource").option("spark.mongodb.output.uri", "mongodb://host:port/database.collection").save()
    

  • An aggregation pipeline can be specified as an option when reading data:

    pipeline = [{'$match': {'fieldA': 1}}]
    df = sqlContext.read.format("com.mongodb.spark.sql.DefaultSource").option("pipeline", pipeline).load()
    



 Comments   
Comment by Anthony Sansone (Inactive) [ 25/May/21 ]

This ticket has been closed due to age and inactivity. Please file a new ticket with recent details if needed. Thank you.

Generated at Thu Feb 08 07:56:56 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.