[KAFKA-151] Out of Memory Issue with source connector in certain scenorio Created: 01/Sep/20 Updated: 27/Oct/23 Resolved: 09/Dec/20 |
|
| Status: | Closed |
| Project: | Kafka Connector |
| Component/s: | None |
| Affects Version/s: | 1.2.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Sabari Gandhi | Assignee: | Ross Lawley |
| Resolution: | Works as Designed | Votes: | 2 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
Kafka Connector: 1.2.0 |
||
| Attachments: |
|
| Description |
|
Setup:
If we end up ignoring or scrapping the particular information in the document with pipeline command example: "pipeline": "[ { $project: { \"fullDocument.studentids\":0}} ]" . we don't see the issue anymore. Can you please confirm the issue and provide us with valid configuration to handle this kind of data. Thanks in advance. |
| Comments |
| Comment by Ross Lawley [ 09/Dec/20 ] |
|
I was able to set the poll.max.batch.size to 100 and then I wouldn't see the OOM exception. However, with messages this large I did see this exception:
You may wish to look at changing the document structure to reduce the size of the messages and / or changing the serialization format to use either Raw Bson Bytes or Avro schema (if the document structure is normalized). I'm going to close this ticket as Works as Designed, because the OOM could also be mitigated by providing more HEAP to the JVM processes. I hope that helps, Ross |
| Comment by Sabari Gandhi [ 30/Nov/20 ] |
|
Hi Ross, Please let me know when you have any updates or additional information in reproducing the test case. Thanks |
| Comment by Sabari Gandhi [ 09/Nov/20 ] |
|
Thanks, Ross - let me know if you need additional information in reproducing the issue. |
| Comment by Ross Lawley [ 09/Nov/20 ] |
|
Moving to in progress - will review the test case and see if I can debug further. |
| Comment by Sabari Gandhi [ 06/Nov/20 ] |
|
This is the mongoDB support post that I've created https://developer.mongodb.com/community/forums/t/out-of-memory-issue-with-source-connector-in-certain-scenario/11330?u=sabari_gandhi1 . I was not able to attach the files to setup and reproduce the issue so keeping this ticket updated with the required information. |
| Comment by Sabari Gandhi [ 06/Nov/20 ] |
|
Thanks for your comments. Please see below the steps to reproduce the issue. These are not the exact data but I was able to reproduce the issue following the steps
Regarding JVM Setting. I allocated 4G by adding this in docker-compose using the below config KAFKA_HEAP_OPTS: “-Xmx4G”. As mentioned I see the issue in prod environment where we use containers with 6G and allocated heap size of 5G. We did try lowering the batch size as low as 300 but still end up hitting the issue in some scenario. Please let me know for additional information or questions regarding steps. |
| Comment by Ross Lawley [ 15/Sep/20 ] |
|
Thank you for reaching out. For future reference as this sounds like a support issue, I wanted to give you some resources to get this questioned answered more quickly:
Just in case you have already opened a support case and are not receiving sufficient help, please let me know and I can facilitate escalating your issue. With regards to the OOM error - the line in question converts the Change stream document into a raw json string. The polling mechanism in Source connectors batch up changes before publishing them to the topic. This can be configured by setting poll.max.batch.size which by default will try to batch 1,000 source records and publish them to the topic. Reducing this max batch size should prevent OOM errors. With out error logs, configuration examples and JVM configuration I can't provide more insight here. Ross |