Improve $lookup local-read fallback in sharded_agg_helpers

    • Type: Improvement
    • Resolution: Unresolved
    • Priority: Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • None
    • Query Optimization
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      In HELP-91015, we find that on 7.0 the inner $lookup subpipeline on shard-05-01 occasionally stops using the local-read optimization in sharded_agg_helpers::targetShardsAndAddMergeCursorsWithRoutingCtx() and instead goes down the ASR remote dispatch path, even when targeting only the local shard. The decision flow is:

      • We compute TargetingResults and, when canUseLocalReadAsCursorSource(opCtx, targeting, localShardId, readConcern) returns true, we call tryAttachCursorSourceForLocalRead(...) in sharded_agg_helpers.cpp.
      • Inside tryAttachCursorSourceForLocalRead we enter shard role with a specific ShardVersion/DatabaseVersion and call attachCursorSourceToPipelineForLocalRead(...). That call is wrapped in a broad try with a TODO SERVER-77402 comment:
      // TODO SERVER-77402 Wrap this in a shardRoleRetry loop instead of catching exceptions.
      // attachCursorSourceToPipelineForLocalRead enters the shard role but does not refresh
      // the shard if the shard has stale metadata. Proceeding to do normal shard targeting...
      try {
          auto pipelineWithCursor = ... attachCursorSourceToPipelineForLocalRead(...);
          return pipelineWithCursor;
      } catch (ExceptionFor<ErrorCodes::StaleDbVersion>&) {
      } catch (ExceptionFor<ErrorCategory::StaleShardVersionError>&) {
      } catch (ExceptionFor<ErrorCodes::CommandNotSupportedOnView>&) {
      } catch (ExceptionFor<ErrorCodes::IllegalChangeToExpectedShardVersion>&) {
      } catch (ExceptionFor<ErrorCodes::IllegalChangeToExpectedDatabaseVersion>&) {
      }
      return nullptr; 
      • If any of these conditions fire (stale DB version, stale shard version, version change mid-op, unresolved view), tryAttachCursorSourceForLocalRead quietly returns nullptr and the caller immediately falls back to dispatchTargetedPipelineAndAddMergeCursors(...), which uses AsyncRequestsSender and the service entry point to run the subpipeline remotely.

      Because the catch blocks are empty, these failures are silent: there is no log record at default verbosity that we attempted and failed to attach a local-read cursor. If targeting for the inner pipeline contains only the local shard, ASR will then create a self-loop (shard → TCP → same shard), and each inner execution is now a full aggregate from the point of view of opcounters and the profiler (fromMongos:false, shardVersion set). Under concurrent moveChunk / metadata changes and heavy $lookup traffic, many local-read attaches can repeatedly see stale metadata and take this broad catch-and-fallback path, effectively disabling local-read for that interval and amplifying the known “one subquery per matching outer document” cost.

      Following things might be helpful:

      1. Add at least LOGV2_WARNING (or structured debug logging plus FTDC counters) inside each tryAttachCursorSourceForLocalRead catch block to record which error was seen and which namespace/pipeline we fell back to remote for. That would have made this incident diagnosable from logs alone.
      2. Backport the SERVER-77402 work.
      3. Add a special optimization when target shard == local shard id to avoid a TCP call for a self network loop.

       

      Attached scripts are what I tried to reproduce issues in HELP-91015, couldn't reproduce the exact case, but some of them show a metadata change could result a local read fallback to remote call without any logs.

        1. lookup_expr_local_read_repro.js
          12 kB
          Zixuan Zhuang
        2. lookup_local_read_diagnostics.js
          17 kB
          Zixuan Zhuang
        3. lookup_local_read_loss_repro.js
          15 kB
          Zixuan Zhuang
        4. lookup_local_vs_remote_harness.js
          16 kB
          Zixuan Zhuang
        5. lookup_self_loop_repro.js
          9 kB
          Zixuan Zhuang
        6. lookup_shard_reduction_repro.js
          9 kB
          Zixuan Zhuang
        7. lookup_subquery_opcounter_experiment.js
          15 kB
          Zixuan Zhuang

            Assignee:
            Unassigned
            Reporter:
            Zixuan Zhuang
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated: