[KAFKA-68] MongoSourceConnector source record key is configurable Created: 10/Sep/19  Updated: 28/Oct/23  Resolved: 17/Aug/20

Status: Closed
Project: Kafka Connector
Component/s: None
Affects Version/s: None
Fix Version/s: 1.3.0

Type: Improvement Priority: Major - P3
Reporter: Wolfgang Strack Assignee: Ross Lawley
Resolution: Fixed Votes: 2
Labels: sp-ga
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
is duplicated by KAFKA-123 Allow configuration of the Source Rec... Closed
Related
related to KAFKA-137 Add support for dotted field names. Closed
Epic Link: Support schema for Source connector

 Description   

At time of this writing, when source records are produced to their respective topic, the message key is given by default in the format

{  "_id": { <bson resume token> }} 

 

Instead, it would be useful if there was an option to configure the message key so that, for example, you could choose a different field from the change-stream document to be converted to a source record.

One particularly useful case would be to choose the "_id" of the actual collection document, if present in the "fullDocument" "documentKey" field. This would allow the corresponding kafka topic to be partitioned on that collection's "_id" field.



 Comments   
Comment by Ross Lawley [ 17/Aug/20 ]

See: KAFKA-124

Comment by Ross Lawley [ 05/Aug/20 ]

KAFKA-124 will allow users to define the schema for the key / record values, using Avro schema definitions.

The Schema can use any field from within the change stream document. For example the default key schema will be:

{
  "type": "record",
  "name": "keySchema",
  "fields" : [{"name": "_id", "type": "string"}]
}

This will produce the '_id' field as a string. (If the field type is not a string, the JsonFormatter (KAFKA-99) will be used to convert it to a string).

This approach will allow any value to become the key. KAFKA-137 will allow for shortened schemas as it will support dotted lookups. For example the below avro schema definition would use the fullDocument.partitionKey field and if that field is missing it will use the default value of DefaultPartition.

{
  "type": "record",
  "name": "keySchema",
  "fields" : [{"name": "fullDocument.partitionKey", "type": "string", "default": "DefaultPartition"}]
}

So I think the new Schema support for both the key and value will provide the required control over the SourceRecord key.

Ross

Comment by Andrey B [ 02/Apr/20 ]

Hi, @Ross Lawley

Usecase the same as in ticket description.

I need to configure the source record key.

The simple case when I just need to write all events for the same document into the same partition. So I want to configure source record key as `documentKey._id`.

But I can imagine when I need something else for the source record key. e.g. some field from `fullDocument`.

So, it would be pretty useful even if you just add the possibility to use `documentKey._id` as source record key.

Comment by Ross Lawley [ 30/Mar/20 ]

Hi andreworty@gmail.com,

This is still on the backlog, we're still discussing how this would be configured. Please feel free to add your usecase? Would you want the _id field?

Ross

Comment by Andrey B [ 26/Mar/20 ]

Hello!
Could someone tell in which status is this ticket? 

Comment by Seth Payne [ 24/Oct/19 ]

In cases where the document key does not exist, we will use the resume token.

Comment by Ross Lawley [ 12/Sep/19 ]

Hi wstrack@riffyn.com, no worries, I've updated the description. The ticket will be reviewed and scheduled in due course.

Ross

Comment by Wolfgang Strack [ 11/Sep/19 ]

Can't edit the description so I apologize for the mis-formatting. Also, for the use-case provided, I meant the "_id" would be pulled from the "documentKey" field of the change stream doc, rather than the potentially absent "fullDocument" field.

Generated at Thu Feb 08 09:05:30 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.