-
Type: Improvement
-
Resolution: Declined
-
Priority: Major - P3
-
None
-
Affects Version/s: None
-
Component/s: Spark Connector
-
Labels:None
-
Environment:Spark Connector
For https://docs.mongodb.com/spark-connector/python-api/
It would help to add additional Python examples such as the following:
- To read data from any database or collection, use a DataFrame and specify the database and/or collection in an option that overrides the default spark.mongodb.input.uri:
dframe = sqlContext.read.format("com.mongodb.spark.sql.DefaultSource").option("spark.mongodb.input.uri", "mongodb://host:port/database.collection").load() dframe.printSchema()
- Similarly, save data to any database or collection by using a DataFrame and overriding the default spark.mongodb.output.uri:
dframe = sqlContext.createDataFrame(rdd) dframe.write.format("com.mongodb.spark.sql.DefaultSource").option("spark.mongodb.output.uri", "mongodb://host:port/database.collection").save()
- An aggregation pipeline can be specified as an option when reading data:
pipeline = [{'$match': {'fieldA': 1}}] df = sqlContext.read.format("com.mongodb.spark.sql.DefaultSource").option("pipeline", pipeline).load()