[KAFKA-246] Support namespaces provided in the Schema Created: 11/Aug/21  Updated: 28/Oct/23  Resolved: 16/Feb/22

Status: Closed
Project: Kafka Connector
Component/s: None
Affects Version/s: None
Fix Version/s: 1.7.0

Type: Improvement Priority: Unknown
Reporter: Ross Lawley Assignee: Valentin Kavalenka
Resolution: Fixed Votes: 0
Labels: size-small
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Documented
Quarter: FY22Q4
Case:
Documentation Changes: Needed

 Description   

See: https://www.mongodb.com/community/forums/t/kafka-source-connector-output-schema-value-not-registering-namespace/102645/2

We took the default schema from the source connector documentation under output.schema.value and modified it as following:

  • Adding namespace
  • Renaming ChangeStream to MongoSourceChangeEvent

We use the schema above in output.schema.value. When the schema is registered by the connector in the schema registry - the namespace field is not included.

This creates a problem for us in Kotlin, where we can’t include namespace in the import because deserialization will fail, instead of:
import com.company.common.avro.MongoSourceChangeEvent

We are forced to remove the namespace from the schema and import it without namespace (the schema being registered is also being used to auto generate Avro classes) :
import MongoSourceChangeEvent

We want to have a proper namespace for our auto generated classes and currently this issue prevents us from doing so.



 Comments   
Comment by Githook User [ 16/Feb/22 ]

Author:

{'name': 'Valentin Kovalenko', 'email': 'valentin.kovalenko@mongodb.com', 'username': 'stIncMale'}

Message: Correctly process namespaces when converting `org.apache.avro.Schema` to `org.apache.kafka.connect.data.Schema` (#103)

`org.apache.kafka.connect.data.Schema` does not separate a namespace from a short name,
and always uses full names, while `org.apache.avro.Schema` separates them.
The "always uses full names" part about `org.apache.kafka.connect.data.Schema`
is not documented anywhere, but based on the following example
https://docs.confluent.io/platform/current/tutorials/examples/connect-streams-pipeline/docs/index.html#example-3-jdbc-source-connector-with-specificavro-key-string-null-and-value-specificavro
and the `SetSchemaMetadata`
(https://github.com/apache/kafka/blob/trunk/connect/transforms/src/main/java/org/apache/kafka/connect/transforms/SetSchemaMetadata.java),
it is clear that the name in `org.apache.kafka.connect.data.Schema`
is supposed to be the full name.

Avro supports namespaces only for schemas of records, enums, and fixed types,
of which MongoDB Kafka Connector supports only records.
Therefore, we need to care about using `org.apache.avro.Schema.getFullName`
as `org.apache.kafka.connect.data.Schema.name()` only for record schemas.

It is worth pointing out that as a result of this change, MongoDB Kafka Connector
will start storing a bit different Avro schemas in Confluent Schema Registry
via `AvroConverter`
(https://github.com/confluentinc/schema-registry/blob/master/avro-converter/src/main/java/io/confluent/connect/avro/AvroConverter.java).
This is not supposed to cause any compatibility issues for connectors, judging from
https://docs.confluent.io/platform/current/schema-registry/avro.html#schema-evolution-and-compatibility,
but will be visible for those users who explicitly fetch schemas from Schema Registry.
It is unclear why users would do that and how, but apparently it is what the user
reported KAFKA-246 is doing. Here are some thoughts on how a user may fetch relevant schemas from Schema Registry:

Other useful links:

KAFKA-246
Branch: master
https://github.com/mongodb/mongo-kafka/commit/6872a7e05fbde4dbc04098f90598c168f0694948

Comment by Jing Yong Lee [ 29/Nov/21 ]

Hi @Ross Lawley/@Esha Bhargava, could I submit a proposed fix to this issue via a pull request?

I think we can include the namespace from org.apache.avro.Schema as a part of org.apache.kafka.connect.data.Schema's name field to ensure the namespace is not omitted from the org.apache.kafka.connect.data.Schema produced by com.mongodb.kafka.connect.source.schema.AvroSchema's createSchema method. 

This practice is also used in Confluent's io.confluent.connect.avro.AvroData:

https://github.com/confluentinc/schema-registry/blob/master/avro-data/src/main/java/io/confluent/connect/avro/AvroData.java#L786

Thanks.

Comment by Ross Lawley [ 19/Aug/21 ]

Hi guptabrijmohan30@gmail.com,

I don't think a PR has been created for this feature yet. Its not planned for release this quarter.

Regarding the test failure, you'd have to look at the test report to understand more about what failed and why.

Ross

Comment by BRIJ MOHAN GUPTA [ 19/Aug/21 ]

Hi @Ross Lawley,

Please let me know when you are targeting this release? Because I am not able to build given PR.

Generated at Thu Feb 08 09:05:56 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.