On "Ross Lawley (JIRA)" <jira@mongodb.org>, May 2, 2017 6:59 PM wrote:

    [ https://jira.mongodb.org/browse/SPARK-119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1562555#comment-1562555 ]

Ross Lawley commented on SPARK-119:
-----------------------------------

Yes, this is because server selection is done on a per operation basis. The way Spark Tasks work is they call the {{compute}} method with the partition information on the {{RDD}} and that returns the data they need to work on. The MongoDB Spark connector has a custom {{RDD}} implementation ({{MongoRDD}}) and the compute method queries MongoDB and returns the cursor data to Spark. Each Task will select a server based on the read preference.

So in short you can use multiple members of your replica set by using a secondary read preference.

I hope that helps,

Ross

> Add the ability to read from multiple servers
> ---------------------------------------------
>
>                 Key: SPARK-119
>                 URL: https://jira.mongodb.org/browse/SPARK-119
>             Project: Spark Connector
>          Issue Type: New Feature
>          Components: Performance
>    Affects Versions: 2.0.0
>            Reporter: Georgios Andrianakis
>            Assignee: Ross Lawley
>            Priority: Minor - P4
>
> It would be awesome if the Mongo Spark Connector could automatically use all the members of a replica set in order to distribute the read load from various Spark Tasks to all the Mongo Servers.

----------------------
This message was sent from MongoDB's issue tracking system. To respond to this ticket, please login to https://jira.mongodb.org using your JIRA or MMS credentials.