Improve debuggability of the add_remove_shard.py hook

XMLWordPrintableJSON

    • Type: Task
    • Resolution: Fixed
    • Priority: Major - P3
    • 8.2.0-rc0
    • Affects Version/s: None
    • Component/s: None
    • None
    • Catalog and Routing
    • CAR Team 2025-06-23
    • 2
    • None
    • 3
    • TBD
    • 🟩 Routing and Topology
    • None
    • None
    • None
    • None
    • None
    • None

      Remove shard workflow + logging

      During remove shard in this hook, we first run the remove shard command and then based on the outcome we take some actions. After we run these actions, we check the timeout. In the case of a shard which is being drained, this can be confusing in a couple of ways.

      The first is that if the removeShardCommand returns "ongoing", we will drain the shard and then check the timeout - if draining took a long time the hook may timeout even though the only thing left to do is commit the shard removal. We should consider optimizing this so that if some draining was performed we make sure to retry the commit even if we hit the timeout while running moveCollection.

      The second is that when the hook times out, we log the most recently run removeShard response, but in the case of us running manual draining this was run before the draining and so the draining information here is potentially very stale. It would be much more helpful to return the actual current draining status at the time that we throw the error - this is linked to the first issue in that retrying the removeShard before timing out would give us a more up to date draining progress.

      Add shard logging

      We currently log both the beginning and end of remove shard / transition to dedicated but we only log the beginning of add shard / transition from dedicated. This makes debugging harder because you have to infer from the start of the next remove shard that the add shard was successful.

      We should add a log message similar to the successful remove shard log for add shard in this hook.

            Assignee:
            Anna Maria Nestorov
            Reporter:
            Allison Easton
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: