[DOCS-12325] [Spark] Add spark.jars.packages config to all Python config examples Created: 06/Jan/19  Updated: 29/Oct/23  Resolved: 23/Sep/21

Status: Closed
Project: Documentation
Component/s: Spark Connector
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Major - P3
Reporter: Mohammed Hameed Assignee: Unassigned
Resolution: Fixed Votes: 0
Labels: sp-docs
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

python


Participants:
Days since reply: 2 years, 19 weeks, 6 days ago
Epic Link: DOCSP-6205

 Description   

On the spark connector python guide pages, it describes how to create spark session the documentation reads:

from pyspark.sql import SparkSession
 
my_spark = SparkSession \
 
.builder \
 
.appName("myApp") \
 
.config("spark.mongodb.input.uri", "mongodb://127.0.0.1/test.coll") \
 
.config("spark.mongodb.output.uri", "mongodb://127.0.0.1/test.coll") \
 
.getOrCreate()

the snippet misses one more config param, which is the mongo spark connector  package

the code should look like this

from pyspark.sql import SparkSession
 
my_spark = SparkSession \
 
.builder \
 
.appName("myApp") \
 
.config("spark.jars.packages", "org.mongodb.spark:mongo-spark-connector_2.11:2.4.0") \
 
.config("spark.mongodb.input.uri", "mongodb://127.0.0.1/test.coll") \
 
.config("spark.mongodb.output.uri", "mongodb://127.0.0.1/test.coll") \ 
 
.getOrCreate()



 Comments   
Comment by Nathan Leniz [ 23/Sep/21 ]

Closing this issue due to inactivity and a working resolution.

Comment by Mohammed Hameed [ 08/Jan/19 ]

Thank you for response. 

i tried the ./bin/pyspark connector in the terminal and it works just fine. that actually led me to search more to find how to include it in the .config method until i found it. 

it was confusing why wasn't that included in the snippet. because it's not straight forward kind of deal for me, especially I am not a java developer.

Comment by Jonathan DeStefano [ 07/Jan/19 ]

Thanks for filing a DOCS ticket. The reason the "spark.jars.packages" is not included in the code snippet is because the earlier part of the guide passes it as a parameter when running pyspark:

 

./bin/pyspark --conf "spark.mongodb.input.uri=mongodb://127.0.0.1/test.myCollection?readPreference=primaryPreferred" \ 
--conf "spark.mongodb.output.uri=mongodb://127.0.0.1/test.myCollection" \ 
--packages org.mongodb.spark:mongo-spark-connector_2.11:2.4.0

We will leave this ticket open and update the python code snippet as you suggested.

Generated at Thu Feb 08 08:04:57 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.