[KAFKA-121] MongoDB Kafka source connector always has single task Created: 10/Jul/20 Updated: 27/Oct/23 Resolved: 10/Jul/20 |
|
| Status: | Closed |
| Project: | Kafka Connector |
| Component/s: | Source |
| Affects Version/s: | 1.1 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Hamid Jawaid | Assignee: | Ross Lawley |
| Resolution: | Works as Designed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
Windows 10, 8GB RAM, AMD Ryzen 5 2500U, 64Bit, HDD |
||
| Issue Links: |
|
||||
| Documentation Changes: | Needed | ||||
| Documentation Changes Summary: | If I set tasks.max=10, using single topic(three partitions), I still see only one task while the fail over to another worker is working fine. |
||||
| Description |
|
I am trying to setup MongoDB-Kafka-Connect as Source listening to change-stream of one collection on my windows machine. I am running three worker nodes on localhost windows machine on ports 8083, 8084, 8085. I am able to create connector with one task and receive change stream events successfully and fail lover to other worker node is also working fine. But the number of tasks spawned is always one. I have set tasks.max=10, and used multiple threads to generate high volume of change stream events but even then number of task remains one. This kind of makes Kafka-producer(MongoDB source connector) not scalable at all. While with multiple partitions on the same topic, consumers are scalable. Is this an issue that I am experiencing or MongoDB-Kafka-Source connector is designed this way? |
| Comments |
| Comment by Hamid Jawaid [ 12/Jul/20 ] |
|
Thanks Ross. Yes, I can have multiple connectors but then I would have to manage duplicates processing of change stream events as both connectors would receive same events. I can provide them different pipeline so that change stream events are not duplicated. Thanks for making it clear. Documentation: https://docs.mongodb.com/kafka-connector/v1.1/kafka-source/ Can documentation be updated with "tasks.max" not having any influence on number of generated tasks?
|
| Comment by Ross Lawley [ 10/Jul/20 ] |
|
The source connector will only ever produce a single task. This is by design as the source connector is backed by a change stream. Change streams internally use the same data as used by replication engine and as such should be able to scale as the database does. There are no plans to allow multiple cursors, however, should you feel that this is not meeting your requirements, then you can configure multiple connectors and each would have its own change stream cursor. Ross |