[KAFKA-93] Connector issue with high load: Query failed with error code 136 and error message 'Error in $cursor stage :: caused by :: errmsg: "CollectionScan died due to failure to restore tailable cursor position Created: 30/Mar/20 Updated: 10/Jun/20 Resolved: 07/Apr/20 |
|
| Status: | Closed |
| Project: | Kafka Connector |
| Component/s: | None |
| Affects Version/s: | 1.1 |
| Fix Version/s: | None |
| Type: | Task | Priority: | Major - P3 |
| Reporter: | Sabari Gandhi | Assignee: | Ross Lawley |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
MongoDB: 3.6.17, Kafka Connector 1.1 |
||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Case: | (copied to CRM) | ||||||||
| Description |
|
Hi, I am using MongoDB: 3.6.17 in my setup. since MongoDB version < 4.0 does not support change streams in database level. I am registering multiple connector instances on my setup. I have 10 connector instances listening to 10 collections in my setup. I try to load around 5000 documents in the collections. After some time I start seeing failures like. and the connector fails and the related Kafka topic does not only partial messages. Also when I look into the connector status in curl -X GET http://localhost:8083/connectors/mongo-source-2/status the result says its running. but the connector is in a failed state.
2020-03-30 17:00:15,513] WARN [Producer clientId=confluent.monitoring.interceptor.connector-producer-mongo-source-7-0] Error while fetching metadata with correlation id 3 : {_confluent-monitoring=LEADER_NOT_AVAILABLE} (org.apache.kafka.clients.NetworkClient) |
| Comments |
| Comment by Sabari Gandhi [ 09/Apr/20 ] |
|
Hi Ross, Thanks for the update. I have added myself in the watch list of |
| Comment by Ross Lawley [ 07/Apr/20 ] |
|
Thanks for the ticket. The issue here is the connector has seen an offset but cannot restart the change stream at that resume token. Currently, the only work around is to clear the offsets stored in Kafka. Please watch the Ross |
| Comment by Sabari Gandhi [ 30/Mar/20 ] |
|
I have attached configuration I have for one of my connector instances. Please read my description as: I try to load around 50000 documents in the collections ( i am not able to edit the description to fix the typo). Because of the error above the connector is not able to recover and I see the below messages [2020-03-30 20:27:44,880] INFO Failed to resume change stream: resume of change stream was not possible, as the resume token was not found. {_data: BinData(0, "825E8254BF000008E9461E5F6964002C34B0005A10046533550540C240D080F0DE7DA09C792704"), _typeBits: BinData(0, "01")} 280 (com.mongodb.kafka.connect.source.MongoSourceTask)
But the curl command for the connector status returns running. I am able to reproduce the scenario with 2 connector instances with high load. Please let me know for any information. Thanks in advance. |