[KAFKA-121] MongoDB Kafka source connector always has single task Created: 10/Jul/20  Updated: 27/Oct/23  Resolved: 10/Jul/20

Status: Closed
Project: Kafka Connector
Component/s: Source
Affects Version/s: 1.1
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Hamid Jawaid Assignee: Ross Lawley
Resolution: Works as Designed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Windows 10, 8GB RAM, AMD Ryzen 5 2500U, 64Bit, HDD


Issue Links:
Documented
Documentation Changes: Needed
Documentation Changes Summary:

If I set tasks.max=10, using single topic(three partitions), I still see only one task while the fail over to another worker is working fine.
Spawning(when and how) of source connector "tasks" is not mentioned in documentation of Official MongoDB source connector. Is this the expected behavior? If yes, then documentation should mention that.


 Description   

I am trying to setup MongoDB-Kafka-Connect as Source listening to change-stream of one collection on my windows machine. I am running three worker nodes on localhost windows machine on ports 8083, 8084, 8085.

I am able to create connector with one task and receive change stream events successfully and fail lover to other worker node is also working fine. But the number of tasks spawned is always one. I have set tasks.max=10, and used multiple threads to generate high volume of change stream events but even then number of task remains one.

This kind of makes Kafka-producer(MongoDB source connector) not scalable at all. While with multiple partitions on the same topic, consumers are scalable. Is this an issue that I am experiencing or MongoDB-Kafka-Source connector is designed this way?



 Comments   
Comment by Hamid Jawaid [ 12/Jul/20 ]

Thanks Ross.

Yes, I can have multiple connectors but then I would have to manage duplicates processing of change stream events as both connectors would receive same events. I can provide them different pipeline so that change stream events are not duplicated.

Thanks for making it clear.

Documentation: https://docs.mongodb.com/kafka-connector/v1.1/kafka-source/

Can documentation be updated with "tasks.max" not having any influence on number of generated tasks?

 

Comment by Ross Lawley [ 10/Jul/20 ]

The source connector will only ever produce a single task.

This is by design as the source connector is backed by a change stream. Change streams internally use the same data as used by replication engine and as such should be able to scale as the database does.

There are no plans to allow multiple cursors, however, should you feel that this is not meeting your requirements, then you can configure multiple connectors and each would have its own change stream cursor.

Ross

Generated at Thu Feb 08 09:05:38 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.