MDE Latency Improvements

XMLWordPrintableJSON

    • Type: Epic
    • Resolution: Unresolved
    • Priority: Major - P3
    • 1.46.9
    • Affects Version/s: None
    • Component/s: Data Explorer
    • None
    • MDE Latency Improvements
    • None
    • To Do
    • None
    • 22
    • 8
    • 22
    • 100
    • 0
    • 🟢 On Track
    • Hide

      Engineer(s): Neal Beeken, Simon Zhu
      2025-08-29: On track for 2025-09-25

      Rationale for any project delays/change in end date/etc, if applicable:
      The findings from this final ticket have been filed in the EPIC, but we have discussed wrapping this work with the improvement the protocol change gives us and keeping the above work as part of the GA work to ensure we're ordering all remaining tasks in highest priority order.

      What was accomplished since the last update?
      We've implemented comprehensive tracing for DE Connect in Sentry and identified four key optimization opportunities:

      • Connection pooling adjustment analysis showing potential to reduce connection times for non-dedicated clusters by counterintuitively lowering the maxPoolSize
      • Servlet filter optimization identified to eliminate per WebSocket connection latency by consolidating parts of the filter chain
      • Authentication streamlining research showing potential for substantial latency improvements by reducing per-message auth checks
      • Plan to generalize the pre-create and sorted host order functionality from dedicated clusters to also apply to non-dedicated clusters

      What's the focus over the next two weeks?

      We are rolling out the new protocol change that improves the initial connect time. Once we get Sentry metrics around next week we'll include or cut the most important tickets added to this EPIC as GA blockers if any.

      Any risks/blockers/impediments?

      None

      Anything else to share?

      None

      Show
      Engineer(s): Neal Beeken, Simon Zhu 2025-08-29: On track for 2025-09-25 Rationale for any project delays/change in end date/etc, if applicable: The findings from this final ticket have been filed in the EPIC, but we have discussed wrapping this work with the improvement the protocol change gives us and keeping the above work as part of the GA work to ensure we're ordering all remaining tasks in highest priority order. What was accomplished since the last update? We've implemented comprehensive tracing for DE Connect in Sentry and identified four key optimization opportunities: Connection pooling adjustment analysis showing potential to reduce connection times for non-dedicated clusters by counterintuitively lowering the maxPoolSize Servlet filter optimization identified to eliminate per WebSocket connection latency by consolidating parts of the filter chain Authentication streamlining research showing potential for substantial latency improvements by reducing per-message auth checks Plan to generalize the pre-create and sorted host order functionality from dedicated clusters to also apply to non-dedicated clusters What's the focus over the next two weeks? We are rolling out the new protocol change that improves the initial connect time. Once we get Sentry metrics around next week we'll include or cut the most important tickets added to this EPIC as GA blockers if any. Any risks/blockers/impediments? None Anything else to share? None
    • Hide

      2025-08-29 - 🟢 On Track
      Engineer(s): Neal Beeken, Simon Zhu
      2025-08-29: On track for 2025-09-25

      Rationale for any project delays/change in end date/etc, if applicable:
      The findings from this final ticket have been filed in the EPIC, but we have discussed wrapping this work with the improvement the protocol change gives us and keeping the above work as part of the GA work to ensure we're ordering all remaining tasks in highest priority order.

      What was accomplished since the last update?
      We've implemented comprehensive tracing for DE Connect in Sentry and identified four key optimization opportunities:

      • Connection pooling adjustment analysis showing potential to reduce connection times for non-dedicated clusters by counterintuitively lowering the maxPoolSize
      • Servlet filter optimization identified to eliminate per WebSocket connection latency by consolidating parts of the filter chain
      • Authentication streamlining research showing potential for substantial latency improvements by reducing per-message auth checks
      • Plan to generalize the pre-create and sorted host order functionality from dedicated clusters to also apply to non-dedicated clusters

      What's the focus over the next two weeks?

      We are rolling out the new protocol change that improves the initial connect time. Once we get Sentry metrics around next week we'll include or cut the most important tickets added to this EPIC as GA blockers if any.

      Any risks/blockers/impediments?

      None

      Anything else to share?

      None


      2025-08-14: Target date set to October 9th 2025

      • Rationale for any project delays/change in end date/etc, if applicable
        • No change.
        • 27% improvement to initial connect latency!! 
      • What's the focus over the next two weeks?
        • The protocol change to improve initial handshake RTTs is in progress.
        • Sentry distributed tracing is working based off the protocol change.
        • Both items need more time to clean up, test, and properly ship. (v2 endpoints)
      • Any risks/blockers/impediments?
        • None
      • Anything else to share?
        • None

      2025-07-31: Target date set to October 9th 2025

      • Rationale for any project delays/change in end date/etc, if applicable
        • Latency has been identified as a key issue for a GA release that removes access to legacy DE, so we reprioritized the work in this epic.
        • The deadline for these changes is moved out because of the slow down on burning down the work in this epic
        • Made progress on investigations in parallel with other work (Neal was working on Dependency upgrades)
          • Captured visualizations for gains made from more initial compass commands that were removable
          • Determined "sorting" the connection string by primary first does improve the connection time significantly
      • What's the focus over the next two weeks?
        • Finalizing the "sort" by primary improvement
        • Code review for the new communication protocol that cuts down on RTTs
      • Any risks/blockers/impediments?
        • None
      • Anything else to share?
        • None

      2025-07-16: Target date set to {}on hold{}

      • Rationale for any project delays/change in end date/etc, if applicable
        • In an effort to focus on GA priorities we are stopping work on this to make sure we paydown the backlog of fixes needed for a GA release.
      • What was accomplished since the last update?
        • Socket pre-creation merged.
        • Multithreading merged.
        • One less ping (of two) on start up fixed.
      • What's the focus over the next two weeks?
        • MDE GA priorities. Not this EPIC.
      • Any risks/blockers/impediments?
        • Re-prioritizing as planned.
      • Anything else to share?
        • None

      2025-07-01: Target date set to 2025-09-12

      • What was accomplished since the last update?
        • Visualization for connection latencies completed informed the planning for the work in this EPIC
        • Pre-created primary connections in progress confirmed the visualization can be used to debug and document progress on the issue
      • What's the focus over the next two weeks?
        • Wrap up socket pre-creation
        • Finalize introducing multithreading for handling each websocket event (as opposed to a thread per websocket) 
        • Start on avoiding running ping commands on start
      • Any risks/blockers/impediments?
        • None
      • Anything else to share?
        • None
      Show
      2025-08-29 - 🟢 On Track Engineer(s): Neal Beeken, Simon Zhu 2025-08-29: On track for 2025-09-25 Rationale for any project delays/change in end date/etc, if applicable: The findings from this final ticket have been filed in the EPIC, but we have discussed wrapping this work with the improvement the protocol change gives us and keeping the above work as part of the GA work to ensure we're ordering all remaining tasks in highest priority order. What was accomplished since the last update? We've implemented comprehensive tracing for DE Connect in Sentry and identified four key optimization opportunities: Connection pooling adjustment analysis showing potential to reduce connection times for non-dedicated clusters by counterintuitively lowering the maxPoolSize Servlet filter optimization identified to eliminate per WebSocket connection latency by consolidating parts of the filter chain Authentication streamlining research showing potential for substantial latency improvements by reducing per-message auth checks Plan to generalize the pre-create and sorted host order functionality from dedicated clusters to also apply to non-dedicated clusters What's the focus over the next two weeks? We are rolling out the new protocol change that improves the initial connect time. Once we get Sentry metrics around next week we'll include or cut the most important tickets added to this EPIC as GA blockers if any. Any risks/blockers/impediments? None Anything else to share? None 2025-08-14 : Target date set to October 9th 2025 Rationale for any project delays/change in end date/etc, if applicable No change. 27% improvement to initial connect latency!!  What's the focus over the next two weeks? The protocol change to improve initial handshake RTTs is in progress. Sentry distributed tracing is working based off the protocol change. Both items need more time to clean up, test, and properly ship. (v2 endpoints) Any risks/blockers/impediments? None Anything else to share? None 2025-07-31 : Target date set to October 9th 2025 Rationale for any project delays/change in end date/etc, if applicable Latency has been identified as a key issue for a GA release that removes access to legacy DE, so we reprioritized the work in this epic. The deadline for these changes is moved out because of the slow down on burning down the work in this epic Made progress on investigations in parallel with other work (Neal was working on Dependency upgrades) Captured visualizations for gains made from more initial compass commands that were removable Determined "sorting" the connection string by primary first does improve the connection time significantly What's the focus over the next two weeks? Finalizing the "sort" by primary improvement Code review for the new communication protocol that cuts down on RTTs Any risks/blockers/impediments? None Anything else to share? None 2025-07-16 : Target date set to { }on hold{ } Rationale for any project delays/change in end date/etc, if applicable In an effort to focus on GA priorities we are stopping work on this to make sure we paydown the backlog of fixes needed for a GA release. What was accomplished since the last update? Socket pre-creation merged. Multithreading merged. One less ping (of two) on start up fixed. What's the focus over the next two weeks? MDE GA priorities. Not this EPIC. Any risks/blockers/impediments? Re-prioritizing as planned. Anything else to share? None 2025-07-01 : Target date set to 2025-09-12 What was accomplished since the last update? Visualization for connection latencies completed informed the planning for the work in this EPIC Pre-created primary connections in progress confirmed the visualization can be used to debug and document progress on the issue What's the focus over the next two weeks? Wrap up socket pre-creation Finalize introducing multithreading for handling each websocket event (as opposed to a thread per websocket)  Start on avoiding running ping commands on start Any risks/blockers/impediments? None Anything else to share? None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

          Assignee:
          Neal Beeken
          Reporter:
          Simon Zhu
          None
          Votes:
          0 Vote for this issue
          Watchers:
          1 Start watching this issue

            Created:
            Updated:
            15 weeks
            None
            None
            None