[
https://jira.mongodb.org/browse/SPARK-119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1562555#comment-1562555 ]
Ross Lawley commented on SPARK-119:
-----------------------------------
Yes, this is because server selection is done on a per operation basis. The way Spark Tasks work is they call the {{compute}} method with the partition information on the {{RDD}} and that returns the data they need to work on. The MongoDB Spark connector has
a custom {{RDD}} implementation ({{MongoRDD}}) and the compute method queries MongoDB and returns the cursor data to Spark. Each Task will select a server based on the read preference.
So in short you can use multiple members of your replica set by using a secondary read preference.
I hope that helps,
Ross
> Add the ability to read from multiple servers
> ---------------------------------------------
>
> Key: SPARK-119
> URL:
https://jira.mongodb.org/browse/SPARK-119
> Project: Spark Connector
> Issue Type: New Feature
> Components: Performance
> Affects Versions: 2.0.0
> Reporter: Georgios Andrianakis
> Assignee: Ross Lawley
> Priority: Minor - P4
>
> It would be awesome if the Mongo Spark Connector could automatically use all the members of a replica set in order to distribute the read load from various Spark Tasks to all the Mongo Servers.
----------------------
This message was sent from MongoDB's issue tracking system. To respond to this ticket, please login to
https://jira.mongodb.org using your JIRA or MMS credentials.