|
For https://docs.mongodb.com/spark-connector/python-api/
It would help to add additional Python examples such as the following:
- To read data from any database or collection, use a DataFrame and specify the database and/or collection in an option that overrides the default spark.mongodb.input.uri:
dframe = sqlContext.read.format("com.mongodb.spark.sql.DefaultSource").option("spark.mongodb.input.uri", "mongodb://host:port/database.collection").load()
|
dframe.printSchema()
|
- Similarly, save data to any database or collection by using a DataFrame and overriding the default spark.mongodb.output.uri:
dframe = sqlContext.createDataFrame(rdd)
|
dframe.write.format("com.mongodb.spark.sql.DefaultSource").option("spark.mongodb.output.uri", "mongodb://host:port/database.collection").save()
|
- An aggregation pipeline can be specified as an option when reading data:
pipeline = [{'$match': {'fieldA': 1}}]
|
df = sqlContext.read.format("com.mongodb.spark.sql.DefaultSource").option("pipeline", pipeline).load()
|
|