Parallel bulk writes from sink connector

XMLWordPrintableJSON

    • Type: New Feature
    • Resolution: Unresolved
    • Priority: Unknown
    • None
    • Affects Version/s: None
    • Component/s: Sink
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      In com.mongodb.kafka.connect.sink.StartedMongoSinkTask#put a collection of records is grouped into batches of writes by namespace (i.e. mongoDB database and collection name). However, this list of distinct batches are then written to MongoDB in serial.

      This means that you will get a large drop in performance if

      1. your sink connector consumes from multiple topics
        or
      2. you add transforms that split data from one topic into multiple collections

       

      My team first noticed this issue during a data rate spike that caused the connector to lag behind by over an hour.

      We should be able to do these bulk writes in parallel with a thread pool (with a configurable pool size) . Since each batch write is to a separate collection, ordering will not be impacted.

            Assignee:
            Unassigned
            Reporter:
            Martin Andersson
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated: