[KAFKA-360] tombstones incompatible downstream integration Created: 10/Mar/23  Updated: 28/Oct/23  Resolved: 21/Aug/23

Status: Closed
Project: Kafka Connector
Component/s: None
Affects Version/s: None
Fix Version/s: 1.11.0

Type: Bug Priority: Unknown
Reporter: Goncalo Pinho Assignee: Ross Lawley
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Documented
Duplicate
is duplicated by KAFKA-353 Sink Connector for Handling Tombstone... Closed
Related
related to KAFKA-353 Sink Connector for Handling Tombstone... Closed
Quarter: FY24Q2
Documentation Changes: Needed
Documentation Changes Summary:

Added configuration (change.stream.document.key.as.key) to use the document key for the sourceRecord key

This is potentially a breaking change as the newly added configuration
change.stream.document.key.as.key defaults to true.

Previously, the resume token was used as the source key, but
it limits the usefulness of tombstones both for topic compactions
and for downstream implementations.

Not all events relate to documents (eg drop collection) so fallbacks to
resume token for any changestream events where there is no documentKey.

As such this is considered both an improvement and a bug fix.

Set to false to revert back to the previous behaviour.


 Description   

Hi there,

Following your release 1.9.0, I've configured a source connector with  `publish.full.document.only.tombstone.on.delete` to be able to publish tombstone records to a kafka compacted topic.

It doesn't work as expected because the BsonDocument Key is created with the ChangeStream ObjectID instead of the DocumentId thus the SourceRecord is created with an unknown ID for downstream integration with kafka and does not match the key .

This ChangeStream ObjectId has nothing to relate with the id used on the kafka records and would break the functionality of tombstone record because it would not be able to relate to an existing record (different key).

I ended up cloning the repository and creating an additional configuration `documentid.on.tombstones` to get the DocumentId instead of the ChangeStream ObjectId on tombstone events. 

If you agree, I can open a PR with this functionality, as follows:

 

// StartedMongoSourceTask.java:250
 
BsonDocument keyDocument;
if (isTombstoneEvent && tombstoneWithDocumentId) {
  keyDocument =
      new BsonDocument(
          ID_FIELD,
          changeStreamDocument
              .get(DOCUMENT_KEY_FIELD)
              .asDocument()
              .get(ID_FIELD));
} else {
  keyDocument =
      sourceConfig.getKeyOutputFormat() == MongoSourceConfig.OutputFormat.SCHEMA
          ? changeStreamDocument
          : new BsonDocument(ID_FIELD, changeStreamDocument.get(ID_FIELD));
} 

 

 

With this workaround I was able to integrate with kafka compacted topics.

Are you aware of this misbehavior? Or do you suggest another way of producing this tombstone with relevant keys?

btw, the documentation on your website is wrong regarding the config name:

  • source code: publish.full.document.only.tombstone.on.delete
  • website: publish.full.document.only.tombstones.on.delete

ref. https://www.mongodb.com/docs/kafka-connector/current/source-connector/configuration-properties/all-properties/



 Comments   
Comment by Githook User [ 21/Aug/23 ]

Author:

{'name': 'Ross Lawley', 'email': 'ross@mongodb.com', 'username': 'rozza'}

Message: Source: Added configuration to use the document key for the sourceRecord key

This is potentially a breaking change as the newly added configuration
`change.stream.document.key.as.key` defaults to true.

Previously, the resume token was used as the source key, but
it limits the usefulness of tombstones both for topic compactions
and for downstream implementations.

Not all events relate to documents (eg drop collection) so fallbacks to
resume token for those events.

As such this is considered both an improvement and a bug fix.

Set to false to revert back to the previous behaviour.

KAFKA-360

Co-authored-by: Ross Lawley <ross@mongodb.com>
Co-authored-by: Goncalo Pinho <goncalopinho@hotmail.com>
Branch: master
https://github.com/mongodb/mongo-kafka/commit/00b266ad0f67e64e70cc79c3e2d05860c89a737d

Comment by Ross Lawley [ 09/Aug/23 ]

Added a new source configuations: change.stream.document.key.as.key defaults to true.

This is potentially a breaking change as the newly added configuration defaults to true.
Previously, the resume token was used as the source key, but it limits the usefulness of tombstones both for topic compactions and for downstream implementations. As such this is considered both an improvement and a bug fix.

Set to false to revert back to the previous behaviour.

Comment by Goncalo Pinho [ 30/Jun/23 ]

Hi robert.walters@mongodb.com, ross@mongodb.com 

any updates on this issue? We are using our own implementation on production workloads and would like to switch for your official release so that we get other updates effortless.

nicolas.gavalda@gmail.com , regarding your comment on all the change events key, for all the other operations it's possible to use SMT's to extract from the document itself a meaningful value for the key. In our use case, we store the key that's used on kafka on the _id for example. Although if the BsonDocument Key was already with the document _id we can avoid using SMT's but that would be a breaking change, isn't it? To avoid that breaking change, it's possible to use the same config variable introduced on the PR related with this issue (renaming it ofc) and give the users the option to decide.

Comment by Nicolas Gavalda [ 12/May/23 ]

I recently stumbled on this issue too, and I have come to the same conclusion that tombstone events as they are implemented now just don't work, as the generated event key doesn't indicate the id of the deleted document.

However, if using the document key or id as the event key would allow it to be used in consumers, it would not fix the tombstone usage for topic compaction: for this, all change event keys should be modified to use the document key. This modification could be limited to "publish.full.document.only" mode, but IMHO should be extended to all modes, as there is no real advantage to use the current change stream document id as the event key (that's what debezium does, as an example).

Comment by Robert Walters [ 04/Apr/23 ]

Moving to backlog for the moment as we are planning the next release and will consider this for 1.11

Comment by Goncalo Pinho [ 29/Mar/23 ]

Hi ross@mongodb.com ,

Thanks for looking into this.

I've submitted the following PR, if you have time to review it.

Goncalo

Comment by Ross Lawley [ 28/Mar/23 ]

Hi goncalopinho@hotmail.com,

Looks like the documentation has been fixed.

I think a PR would be useful to surface the actual document id for the deletion.

Ross

Generated at Thu Feb 08 09:06:12 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.