Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-96226

Improve error handling for VPC peered Kafka connections

    • Type: Icon: Task Task
    • Resolution: Duplicate
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • None
    • Atlas Streams

      1. There are some places we use tassert that might fail due to transient network errors. We should change these to uasserts to avoid the noise in #asp-engine-warnings channel. (unless we want the stack traces to show up in that channel). An example is the "KafkaConnectAuthCallback::readSocketData received less data than" assert discussed in this slack thread: https://mongodb.slack.com/archives/C07R76FQ3UJ/p1729873119093639?thread_ts=1729872106.807049&cid=C07R76FQ3UJ.
      2. We should improve the error message below to indicate there is a transient network error occuring. There are some other places as well.

          tassert(ErrorCodes::InternalError,
                  str::stream() << "KafkaConnectAuthCallback::readSocketData received less data than "
                                   "expected, received a total of "
                                << bytesStored << " bytes, and expected " << readBuffer.size(),
                  bytesStored == readBuffer.size());

       

      Make a follow up ticket for:

      1. Currently when the KafkaConnectAuthCallback and KafkaResolveCallback classes throw an InternalError, it gets bubbled up through librdkafka. Our code in KafkaPartitionConsumer / KafkaEmitOperator eventuallys throws a StreamProcessorKafkaConnectionError, which is considered a user error. If we want to be alerted on "internal errors" for VPC peering... we should amend this flow to ultimately throw a different error code (StreamProcessorVPCConnectionError ?)
        1. This error will happen from time to time. So we can also increase the alert threshold when we do this: https://github.com/10gen/mongohouse/blob/master/alerts/mhouse_streams/mhouse-streams-prod.yaml#L195

       

            Assignee:
            Unassigned Unassigned
            Reporter:
            matthew.normyle@mongodb.com Matthew Normyle
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated:
              Resolved: