Fix legacy timeseries namespace translation in write explain commands

XMLWordPrintableJSON

    • Type: Bug
    • Resolution: Unresolved
    • Priority: Major - P3
    • None
    • Affects Version/s: 8.3.0-rc0, 8.2.0
    • Component/s: None
    • None
    • Catalog and Routing
    • ALL
    • Hide

      Execute the attached repro on the no_passthrough suite in commit r8.3.0-alpha0-3261-gb815656be09

      Show
      Execute the attached repro on the no_passthrough suite in commit r8.3.0-alpha0-3261-gb815656be09
    • CAR Team 2025-12-08
    • 🟩 Routing and Topology
    • None
    • None
    • None
    • None
    • None
    • None

      The explain command for write operations (e.g. insert/delete/update) on legacy tracked timeseries executed from a router with a stale cache could fail with StaleConfig exception after extinguish the 10 retries.

      At the high level this is what happens:

      • The stale router things the collection is a tracked legacy timeseries
      • It forward the command to the shard(s) using the timeseries view namespace but attaching the shard version from the associated timeseries buckets collection.
      • The shard receive the command and check the shard version against the received namespace (timeseries view). Since the collection have been dropped and recreated as a normal collection it sends a StaleVersion to the router informing them that the view namespace is now unsharded.
      • The router receives the StaleVersion error, perform a refresh of the view timeseries namespace but not of the buckets namespace and retry the operation starting from the beginning.

      The solution to this is to make the shard always send the correct namespace and shard version along with the command. In this case since the router thinks the collection is a tracked legacy timeseries it should convert the command to target the system.buckets namespace and forward the translated commands to the shard(s) along with the shard version of the system.buckets namespace. If we do this the shard will correctly send a StaleVersion error for the system.buckets namespace, the router will refresh the cache associated to the system.buckets namespace and they will eventually agree on the shard version of the collection.

            Assignee:
            Tommaso Tocci
            Reporter:
            Tommaso Tocci
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated: