Uploaded image for project: 'Kafka Connector'
  1. Kafka Connector
  2. KAFKA-151

Out of Memory Issue with source connector in certain scenorio

    • Type: Icon: Bug Bug
    • Resolution: Works as Designed
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: 1.2.0
    • Component/s: None
    • Labels:
      None
    • Environment:
      Kafka Connector: 1.2.0
      MongoDb version: 3.6.17

      Setup:

      • I have a document with slightly greater size (>8MB) and in the document i have an array say for example studentids which has 100k-150K ids . "studentids" : [NumberLong("906019125703444"),NumberLong("326026735808036"), ...] etc
      • In connector configuration I have "change.stream.full.document": "updateLookup" since we need full document for every update
      • I have a small utility which updates the document in a loop for around 1000 times. 
      • The above exercise results in an OOM exception which is attached. Even though the error happens at com.mongodb.kafka.connect.source.MongoSourceTask.poll(MongoSourceTask.java:192) we have a hunch the issue is manifested and the issue is with how connector handles this type of data.

      If we end up ignoring or scrapping the particular information in the document with pipeline command example: "pipeline": "[ { $project:

      { \"fullDocument.studentids\":0}

      } ]" . we don't see the issue anymore.

      Can you please confirm the issue and provide us with valid configuration to handle this kind of data. Thanks in advance.

        1. ImportData.java
          0.8 kB
        2. OOM.log
          3 kB
        3. sample_data.json
          8.91 MB
        4. source_connect.sh
          1 kB

            Assignee:
            ross@mongodb.com Ross Lawley
            Reporter:
            sabari.mgn@gmail.com Sabari Gandhi
            Votes:
            2 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: