Details
-
Improvement
-
Resolution: Declined
-
Major - P3
-
None
-
None
-
None
-
Spark Connector
Description
For https://docs.mongodb.com/spark-connector/python-api/
It would help to add additional Python examples such as the following:
- To read data from any database or collection, use a DataFrame and specify the database and/or collection in an option that overrides the default spark.mongodb.input.uri:
dframe = sqlContext.read.format("com.mongodb.spark.sql.DefaultSource").option("spark.mongodb.input.uri", "mongodb://host:port/database.collection").load()
dframe.printSchema()
- Similarly, save data to any database or collection by using a DataFrame and overriding the default spark.mongodb.output.uri:
dframe = sqlContext.createDataFrame(rdd)
dframe.write.format("com.mongodb.spark.sql.DefaultSource").option("spark.mongodb.output.uri", "mongodb://host:port/database.collection").save()
- An aggregation pipeline can be specified as an option when reading data:
pipeline = [{'$match': {'fieldA': 1}}]
df = sqlContext.read.format("com.mongodb.spark.sql.DefaultSource").option("pipeline", pipeline).load()