[mcp] Handle rate limited telemetry

XMLWordPrintableJSON

    • Type: Bug
    • Resolution: Fixed
    • Priority: Major - P3
    • vNext
    • Affects Version/s: None
    • Component/s: Telemetry Data
    • None
    • Not Needed
    • Developer Tools

      Problem

      When telemetry event sending fails — including on 429 Too Many Requests — all events are re-cached and retried on the very next emitEvents call. There is no backoff, no rate limiting, and no batching. This means:

      • A 429 response causes an immediate retry with the full accumulated cache, likely triggering another 429 — a classic retry storm
      • All cached events are sent in a single unbatched request, making it easy to blow past the 64 events/minute rate limit
      • The more the server is rate-limited, the more events accumulate, making each retry worse

      Current behavior

      emitEvents directly calls emit, which attempts to send immediately (telemetry.ts#L130-L138):

      public emitEvents(events: BaseEvent[]): void {
          if (!this.isTelemetryEnabled()) {
              this.events.emit("events-skipped");
              return;
          }
          void this.emit(events);
      }
      

      On failure, all events are returned to the cache and retried next time (telemetry.ts#L203-L212):

      const result = await this.sendEvents(apiClient, allEvents, { signal });
      if (!result.success) {
          // all events — including previously cached ones — go back into the cache
          return allEvents;
      }
      

      sendEvents treats all errors identically — there is no special handling for 429 (telemetry.ts#L260-L265):

      } catch (error) {
          return {
              success: false,
              error: error instanceof Error ? error : new Error(String(error)),
          };
      }
      

      Proposed Solution

      Decouple event collection from event sending by introducing an interval-based sender:

      • emitEvents only appends to the cache — it never sends directly
      • A setInterval fires every 30 seconds and sends up to 32 events from the cache (staying within the 64 events/minute rate limit)
      • On 429: cancel the interval and apply exponential backoff (starting at 60s, doubling each time, capped at 1 hour), then restart the interval after the backoff delay
      • On success: reset the backoff to its initial value
      • On close(): cancel the interval and synchronously drain the remaining cache in batches of 32

      Acceptance Criteria

      • emitEvents no longer triggers a direct send
      • Events are sent in batches of at most 32
      • No more than 64 events are sent per minute under normal conditions
      • A 429 response triggers exponential backoff (60s → 120s → 240s → ... → 3600s max)
      • Backoff resets to 60s after a successful send
      • Server shutdown flushes remaining cached events in batches before closing

            Assignee:
            Jeroen Vervaeke
            Reporter:
            Jeroen Vervaeke
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated:
              Resolved: