[KAFKA-175] Inferring schema should support variable types for uses with Json with Schema. Created: 19/Nov/20 Updated: 27/Oct/23 Resolved: 03/Jan/23 |
|
| Status: | Closed |
| Project: | Kafka Connector |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Robert Walters | Assignee: | Ross Lawley |
| Resolution: | Works as Designed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Case: | (copied to CRM) | ||||||||
| Description |
|
Schema inference uses the base type when determining the schema for arrays. So when sourcing the following document structure:
The type of L3 is Array with a value type of Schema.STRING:
Configuration:
Json schemas do allow variable object types for Structs and Arrays: Array Compatibility. So when output.schema.infer.value=true then when providing schema for Json with schema then there should be no use of a Base type. Note this will require an extra configuration eg: "output.schema.infer.compatibility:[none|all]" - default to all compatibility to keep the current behaviour. For reference see: |
| Comments |
| Comment by Ross Lawley [ 03/Jan/23 ] |
|
Marking as won't fix for the reasons provided.
|
| Comment by Ross Lawley [ 03/Jan/23 ] |
|
Having reviewed the API's available the Schema.Type#Array is:
So there is no way in the SourceRecord API to natively support this. While "Json with schema should be able to support varying types for all data" is true, the connector has to produce SourceRecords which has its own schema restrictions. Converters (eg to Json / Json with schema) are applied once the SourceRecord using the schema'd information is produced. So to handle multiple types of data, producing an Array of Json strings is the workaround for this limitation. |
| Comment by Ross Lawley [ 20/Nov/20 ] |
|
I've reopened as Json with schema should be able to support varying types for all data. It's not obvious how to achieve that using the SchemaBuilder API. |
| Comment by Ross Lawley [ 19/Nov/20 ] |
|
Hi robert.walters. This is "works as designed". Arrays have to have fixed schemas for the value type. Here the array has two totally different schema'd documents and in that case the connector goes to the base type which is String. Ross |