[KAFKA-209] Projection always includes _id Created: 19/Mar/21  Updated: 28/Oct/23  Resolved: 23/Mar/21

Status: Closed
Project: Kafka Connector
Component/s: Sink
Affects Version/s: None
Fix Version/s: 1.5.0

Type: Bug Priority: Major - P3
Reporter: Robert Walters Assignee: Ross Lawley
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   

Problem Description

DeleteOneBusinessKeyStrategy throws an error when doing partial projection and an existing _id field exists and document.id.strategy.overwrite.existing is not set or is false.
Also with document.id.strategy.overwrite.existing: true the _id field is included even though it is not listed in the projection list.

Steps to Reproduce

I have a kafkatopic "FaceMaskWeb.OrderCancel" that has a message:

"{\"_id\": {\"$oid\": \"6053bb753283c9f1e8584cb4\"}, \"order-id\": 100}"

I have a sink that is defined on the "FaceMaskWeb.OrderCancel" topic as

"database":"FaceMaskWeb",
    "collection":"Orders",
    "writemodel.strategy": "com.mongodb.kafka.connect.sink.writemodel.strategy.DeleteOneBusinessKeyStrategy",
    "document.id.strategy": "com.mongodb.kafka.connect.sink.processor.id.strategy.PartialValueStrategy",
    "document.id.strategy.partial.value.projection.type": "AllowList",
    "document.id.strategy.partial.value.projection.list": "order-id",

I am trying to delete the document in the Orders collection that looks like this:

  {
    _id: ObjectId("60538d8d360dc9fda35f1312"),
    'customer-id': 123,
    'order-id': 100,
    order: { lineitem: 1, SKU: 'FACE1', quantity: 1 }
  }

My Sink fails with:

org.apache.kafka.connect.errors.DataException: Could not build the WriteModel, the value document does not contain an _id field of type BsonDocument which holds the business key fields.

I am trying to delete the document based upon the order-id field not any _id field, yet this error message makes me think that the order-id field need to be apart of the _id field maybe as a compound key?

From a slack discussion with Ross, the document.id.strategy.partial.value.projection.list might not be returning a document in this scenario where there is just one key in the project list.

Expected Results

field in allowlist would be used as the field to delete in the collection

Actual Results

Additional Notes



 Comments   
Comment by Ross Lawley [ 23/Mar/21 ]

After discussion with robert.walters, it was determined that projection should always be explicit.

The current documentation supports this so it is surprising that _id was automatically included.

Comment by Githook User [ 23/Mar/21 ]

Author:

{'name': 'Ross Lawley', 'email': 'ross.lawley@gmail.com', 'username': 'rozza'}

Message: Fixed `_id` projection.

Previously, `_id` has always being projected even if not explicitly
allowed or blocked. This undocumented implicit behaviour is surprising
and breaks the explicit nature of projections.

KAFKA-209
Branch: master
https://github.com/mongodb/mongo-kafka/commit/1837a7ce65f1f0611f9faa819a41b801528eee9b

Comment by Ross Lawley [ 22/Mar/21 ]

Hi juan.soto,

Yes the bug is in the field projection - which previously never removed the _id field under any circumstances. So if you happened to be using two separate systems and each contained their own _id values the connector could never couple them up even if the business logic is kept in a separate key (eg. order-id).

Now _id will only be included/excluded by the projection if it is explicitly added or if the projection results in no fields and there is an _id field. (This is a fallback to help maintain existing behaviour).

Ross

Comment by Juan Soto (Inactive) [ 22/Mar/21 ]

Hi!!

Then, is the a bug for all "OneBusinessKeyStrategy" operation ? is not just for the delete?

Regards,

Juan

Comment by Ross Lawley [ 22/Mar/21 ]

After discussion with robert.walters we've come to the conclusion that it is a bug to automatically include the _id field if you are explicitly listing the fields to be projected.

Its surprising and undocumented behaviour.

Comment by Ross Lawley [ 22/Mar/21 ]

Hi robert.walters,

OK this behaviour is expected from the terms of the connector.

So a couple of points that need clarifying as to why that is:

1) The document.id.strategy will always include a _id field if it exists in the sink record.
2) The document.id.strategy by default will never overwrite an existing _id field with a new value.

So to include order-id as part of the filter to be removed you must set:

document.id.strategy.overwrite.existing: true

That will then make the following query for the DeleteOneModel:

{"_id": {"$oid": "6053bb753283c9f1e8584cb4"}, "order-id": 100}

I've added a logging message if the _id field exists but document.id.strategy.overwrite.existing is false.

Please file any documentation tickets that are needed to improve the documentation.

Ross

Comment by Githook User [ 22/Mar/21 ]

Author:

{'name': 'Ross Lawley', 'email': 'ross.lawley@gmail.com', 'username': 'rozza'}

Message: Improve DocumentIdAdder logging.

Log a warning message when there the `_id` value and the
id strategy is configured not to overwrite the existing `_id` value.

KAFKA-209
Branch: master
https://github.com/mongodb/mongo-kafka/commit/c41d263d75910d460d9506c48e36ce30f5d5ff74

Generated at Thu Feb 08 09:05:50 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.