[KAFKA-295] MongoDB source connector: Issue when running connector for a deployment level Created: 23/Feb/22  Updated: 27/Oct/23  Resolved: 23/Mar/22

Status: Closed
Project: Kafka Connector
Component/s: Source
Affects Version/s: 1.5.1
Fix Version/s: None

Type: Bug Priority: Unknown
Reporter: Sabari Gandhi Assignee: Robert Walters
Resolution: Gone away Votes: 1
Labels: external-user
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related

 Description   

Setup:
MongoDB source connector v 1.5.1
Sharded cluster v4.2

We are running connector on deployment and listening to certain collections with the match command. The load we are seeing from the collections is very low.

 

"pipeline": "[ { $project: { \"updateDescription\":0 }}, { $match: { \"ns.coll\": { \"$in\": [\"coll1\" ,\"coll2\", \"coll3\", \"coll4\" ] } } } ]",
 

We are seeing the resume change stream issue once every 2 days (which matches the oplog window/limit we have)

An exception occurred when trying to get the next item from the Change Stream: Query failed with error code 286 and error message 'Error on remote shard server_name:27018 :: caused by :: Resume of change stream was not possible, as the resume point may no longer be in the oplog.' on server server_name:27017

We have experience running connectors targeted against specific collections handling high volume but have not seen the issue. But this is the connector we are running
against a deployment. Are we missing any configuration on the connector/cluster?

P.S: We also looked into some docs and are testing the connector with heartbeat.interval.ms configuration since we assume that this may be due to infrequently updated namespaces.

Are we missing any specific configuration on connector/cluster?  Thanks in advance.

namespaces https://docs.mongodb.com/kafka-connector/current/troubleshooting/recover-from-invalid-resume-token/#std-label-kafka-troubleshoot-invalid-resume-token.



 Comments   
Comment by PM Bot [ 23/Mar/22 ]

There hasn't been any recent activity on this ticket, so we're resolving it. Thanks for reaching out! Please feel free to comment on this if you're able to provide more information.

Comment by Sabari Gandhi [ 08/Mar/22 ]

Thanks, @Robert for the response. We have an oplog window for about 48 hours.

We've also been testing out the heartbeat feature in the connector to see if that fixes the issue. Please see below the understanding we have on that:

As mentioned earlier, we have the connector configuration at the deployment level (across databases) and we filter the collection using match configuration

 

("pipeline": "[ { $project: { "updateDescription":0 }}, { $match: { "ns.coll":{ "$in": [ "coll1", "coll2", "coll3" ] }} } ]",)

 

In this setup even with heartbeat enabled we are getting the same error. Based on documentation "The connector sends heartbeat messages when source records are not published in the specified interval." So in this setup let's say one collection receives change event at regular intervals in this scenario heartbeat will not happen and eventually, when oplog rollover happens it will not have the resumetoken for the collection with little activity. Can you please confirm if this understanding is right?

Also, does heartbeat implementation is supported at the deployment level for the source connector? Most of the documentation mentions heartbeat implementation at the namespace level. 

Comment by Robert Walters [ 28/Feb/22 ]

it sounds like the oplog is rolling over so when the change stream event happens the id is not found , you could try increasing the size of the oplog.

Generated at Thu Feb 08 09:06:02 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.