[KAFKA-339] Topic names cleanup to match Kafka topic naming restrictions Created: 15/Nov/22  Updated: 14/Aug/23

Status: Backlog
Project: Kafka Connector
Component/s: None
Affects Version/s: None
Fix Version/s: 1.12.0

Type: New Feature Priority: Major - P3
Reporter: Alon Prantsipal Assignee: Unassigned
Resolution: Unresolved Votes: 1
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Quarter: FY24Q3

 Description   

By default, the MongoDB Kafka source connector publishes change event data to a Kafka topic with the same name as the MongoDB namespace from which the change events originated. 

As naming restrictions for database/collection names in MongoDB allow characters that are restricted for Kafka topics (e.g whitespace), such namespaces will fail in the connector.

While topic.namespace.map can be used as a workaround, it won't be of any help for newly created databases/collections and has to be constantly updated upon failures.

The property topic.mapper looks relevant but lacks any documentation and seems complicated to implement.

IMO, a topic name sanitization should be implemented, something similar to how it's done in Debezium:  https://github.com/debezium/debezium/blob/4d577655968d2e00edb67a7e53702b8c38a17023/debezium-api/src/main/java/io/debezium/spi/topic/TopicNamingStrategy.java#L46

(Debezium MongoDB Source Connector cleans up any illegal characters for topic name by converting them to underscores)



 Comments   
Comment by Esha Bhargava [ 21/Nov/22 ]

alonp@accessfintech.com Thank you for reporting this issue. We'll consider it for a future release.

Comment by Alon Prantsipal [ 15/Nov/22 ]

I looked a bit more into the topic.mapper and it does seem like it can be used for that purpose so it's not a blocker but it's still feels a bit risky and cumbersome to implement it with a java class rather than with a dedicated property.

Generated at Thu Feb 08 09:06:09 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.