Uploaded image for project: 'Kafka Connector'
  1. Kafka Connector
  2. KAFKA-366

Parallel bulk writes from sink connector

    • Type: Icon: New Feature New Feature
    • Resolution: Unresolved
    • Priority: Icon: Unknown Unknown
    • 1.12.0
    • Affects Version/s: None
    • Component/s: Sink
    • Labels:
      None

      In com.mongodb.kafka.connect.sink.StartedMongoSinkTask#put a collection of records is grouped into batches of writes by namespace (i.e. mongoDB database and collection name). However, this list of distinct batches are then written to MongoDB in serial.

      This means that you will get a large drop in performance if

      1. your sink connector consumes from multiple topics
        or
      2. you add transforms that split data from one topic into multiple collections

       

      My team first noticed this issue during a data rate spike that caused the connector to lag behind by over an hour.

      We should be able to do these bulk writes in parallel with a thread pool (with a configurable pool size) . Since each batch write is to a separate collection, ordering will not be impacted.

            Assignee:
            Unassigned Unassigned
            Reporter:
            martin.andersson@kambi.com Martin Andersson
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated: