-
Type: Improvement
-
Resolution: Duplicate
-
Priority: Unknown
-
Affects Version/s: 1.5.0
-
Component/s: Source
-
(copied to CRM)
Hi Team,
At the moment, the Kafka Source Connector provides no easy way to understand how far behind the Connector is in relationship to the associated Change Stream of the Source Cluster.
Being aware of that lag and being able to easily implement monitoring and alerting based on it would be really helpful because of the following:
- Although there might be different thresholds based on use case, lag issues negatively affect the associated data pipeline.
- Avoid the Source Connector's resume token from falling off the oplog.
Nowadays, a custom monitoring solution consisting of a consumer that does the following could be implemented as a workaround:
- Consumes messages from the connect-offsets topic, filters for the required Connector based on message key, decodes the resume token, and obtains the clusterTime from the decoded output.
- Synchronously retrieves the lastCommittedOpTime from the Source MongoDB Deployment.
- Compares (1) and (2) and alerts based on a significant delta it finds.
Although feasible, this solution requires development effort and gets more complicated when dealing with a Cluster with more than one Shard. Some solutions that came up after some discussion are:
- Add the lag as a metric that can be exported to common monitoring applications like Datadog.
- Incorporate lastCommittedOpTime into the message value for the connect-offsets topic so that the information can be more easily retrieved, making the custom monitoring solution easier to implement.
I'm open to any other suggestions that might make help accomplish this.
Regards
Diego
- duplicates
-
KAFKA-64 Expose monitoring metrics over JMX
- Closed
- is duplicated by
-
KAFKA-251 Save cluster time when it was inserted or updated
- Closed
- is related to
-
KAFKA-64 Expose monitoring metrics over JMX
- Closed
- related to
-
SERVER-57986 Report metrics for lag monitoring in change streams output
- Closed