[KAFKA-360] tombstones incompatible downstream integration Created: 10/Mar/23 Updated: 28/Oct/23 Resolved: 21/Aug/23 |
|
| Status: | Closed |
| Project: | Kafka Connector |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | 1.11.0 |
| Type: | Bug | Priority: | Unknown |
| Reporter: | Goncalo Pinho | Assignee: | Ross Lawley |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||
| Quarter: | FY24Q2 | ||||||||||||||||||||
| Documentation Changes: | Needed | ||||||||||||||||||||
| Documentation Changes Summary: | Added configuration (change.stream.document.key.as.key) to use the document key for the sourceRecord key This is potentially a breaking change as the newly added configuration Previously, the resume token was used as the source key, but Not all events relate to documents (eg drop collection) so fallbacks to As such this is considered both an improvement and a bug fix. Set to false to revert back to the previous behaviour. |
||||||||||||||||||||
| Description |
|
Hi there, Following your release 1.9.0, I've configured a source connector with `publish.full.document.only.tombstone.on.delete` to be able to publish tombstone records to a kafka compacted topic. It doesn't work as expected because the BsonDocument Key is created with the ChangeStream ObjectID instead of the DocumentId thus the SourceRecord is created with an unknown ID for downstream integration with kafka and does not match the key . This ChangeStream ObjectId has nothing to relate with the id used on the kafka records and would break the functionality of tombstone record because it would not be able to relate to an existing record (different key). I ended up cloning the repository and creating an additional configuration `documentid.on.tombstones` to get the DocumentId instead of the ChangeStream ObjectId on tombstone events. If you agree, I can open a PR with this functionality, as follows:
With this workaround I was able to integrate with kafka compacted topics. Are you aware of this misbehavior? Or do you suggest another way of producing this tombstone with relevant keys? btw, the documentation on your website is wrong regarding the config name:
|
| Comments |
| Comment by Githook User [ 21/Aug/23 ] |
|
Author: {'name': 'Ross Lawley', 'email': 'ross@mongodb.com', 'username': 'rozza'}Message: Source: Added configuration to use the document key for the sourceRecord key This is potentially a breaking change as the newly added configuration Previously, the resume token was used as the source key, but Not all events relate to documents (eg drop collection) so fallbacks to As such this is considered both an improvement and a bug fix. Set to false to revert back to the previous behaviour. Co-authored-by: Ross Lawley <ross@mongodb.com> |
| Comment by Ross Lawley [ 09/Aug/23 ] |
|
Added a new source configuations: change.stream.document.key.as.key defaults to true. This is potentially a breaking change as the newly added configuration defaults to true. Set to false to revert back to the previous behaviour. |
| Comment by Goncalo Pinho [ 30/Jun/23 ] |
|
Hi robert.walters@mongodb.com, ross@mongodb.com any updates on this issue? We are using our own implementation on production workloads and would like to switch for your official release so that we get other updates effortless. nicolas.gavalda@gmail.com , regarding your comment on all the change events key, for all the other operations it's possible to use SMT's to extract from the document itself a meaningful value for the key. In our use case, we store the key that's used on kafka on the _id for example. Although if the BsonDocument Key was already with the document _id we can avoid using SMT's but that would be a breaking change, isn't it? To avoid that breaking change, it's possible to use the same config variable introduced on the PR related with this issue (renaming it ofc) and give the users the option to decide. |
| Comment by Nicolas Gavalda [ 12/May/23 ] |
|
I recently stumbled on this issue too, and I have come to the same conclusion that tombstone events as they are implemented now just don't work, as the generated event key doesn't indicate the id of the deleted document. However, if using the document key or id as the event key would allow it to be used in consumers, it would not fix the tombstone usage for topic compaction: for this, all change event keys should be modified to use the document key. This modification could be limited to "publish.full.document.only" mode, but IMHO should be extended to all modes, as there is no real advantage to use the current change stream document id as the event key (that's what debezium does, as an example). |
| Comment by Robert Walters [ 04/Apr/23 ] |
|
Moving to backlog for the moment as we are planning the next release and will consider this for 1.11 |
| Comment by Goncalo Pinho [ 29/Mar/23 ] |
|
Hi ross@mongodb.com , Thanks for looking into this. I've submitted the following PR, if you have time to review it. Goncalo |
| Comment by Ross Lawley [ 28/Mar/23 ] |
|
Looks like the documentation has been fixed. I think a PR would be useful to surface the actual document id for the deletion. Ross |