[KAFKA-253] Allow Kafka Sink Connector to Execute Unordered Bulk Operations Created: 29/Sep/21 Updated: 28/Oct/23 Resolved: 12/Jan/22 |
|
| Status: | Closed |
| Project: | Kafka Connector |
| Component/s: | Sink |
| Affects Version/s: | None |
| Fix Version/s: | 1.7.0 |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Diego Rodriguez (Inactive) | Assignee: | Valentin Kavalenka |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | internal-user | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||
| Epic Link: | Write MongoDB errors to the DLQ | ||||
| Quarter: | FY22Q4 | ||||
| Case: | (copied to CRM) | ||||
| Documentation Changes: | Needed | ||||
| Documentation Changes Summary: | If implemented, this new behavior will need to be properly documented. |
||||
| Description |
|
Hi Team, As of now, the Sink Connector only executes operations using "ordered" bulk writes that guarantee the message ordering within each source topic partition. There might be circumstances where ordering is not required and executing "unordered" bulk operations might have benefits:
The default should still be "ordered" bulk writes but I propose to add the ability to modify this behavior by adding a property to switch to "unordered" bulk operations. The implications of this change should be made very clear in our documentation page and it might even be wise to throw a warning about message processing order in the logs. Thanks |
| Comments |
| Comment by Githook User [ 19/Jan/22 ] |
|
Author: {'name': 'Valentin Kovalenko', 'email': 'valentin.male.kovalenko@gmail.com', 'username': 'stIncMale'}Message: Mention `bulk.write.ordered` in `CHANGELOG.md` (#98)
|
| Comment by Githook User [ 12/Jan/22 ] |
|
Author: {'name': 'Valentin Kovalenko', 'email': 'valentin.male.kovalenko@gmail.com', 'username': 'stIncMale'}Message: Add support for the `bulk.write.ordered` sink connector property (#96)
|
| Comment by Albert Wong (Inactive) [ 14/Oct/21 ] |
|
Problem Statement: The Mongo Kafka Connector sink process works with a bulk insert in an ordered fashion. As a result, if an error is encountered during the insert process (for example, violation of an unique constraint), not only does the individual record encountering the error fail, but all subsequent records in the batch also fail. Furthermore, the failure occurs without an expected error message or routing of the failed inserts to the dead letter topic.
For example, if records 1-10 are being inserted into Mongo through the connector process and record #6 already exists in the target collection, not only with record #6 fail to insert, but records 7-10 will as well. None of these 5 records (6-10) will be written to the dead letter topic.
Stories:
Acceptance Criteria:
Acceptance Criteria:
Acceptance Criteria:
Acceptance Criteria:
Of these four stories, Story #1 is an absolute requirement – data cannot be allowed to fall on the floor without some error notification or other means by which we can track what happened to the data. Story #2 is the ideal end result. Story #3 is the next best option, with Story #4 being the minimal solution (the difference between story #3 and #4 is story #3 is generic against any error, with story #4 focusing on the specific error we have encountered. While I would recommend the more broad solution for other customers, the more focused one would be the minimum criteria). |