Unexpected High CPU and WiredTiger Cache Load in Sharded Cluster Using TimeSeries Collections

XMLWordPrintableJSON

    • Type: Bug
    • Resolution: Won't Do
    • Priority: Major - P3
    • None
    • Affects Version/s: 7.0.18
    • Component/s: None
    • None
    • ALL
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Hi MongoDB Support Team,

      We are experiencing a recurring performance issue in our production environment that utilizes time-series collections within a sharded MongoDB cluster (8 shards).

      Environment Details:
        • MongoDB Version: 7.0.18
        • Cluster Type: Sharded (8 shards)
        • Workload: High write volume, minimal reads
        • Affected Component: TimeSeries collections
        • Namespace: smt.system.buckets.userEvents

      Issue Summary:

      In our production environment, we use TimeSeries collections on a sharded cluster. Recently, we've observed a recurring spike in CPU utilization (100%) on the primary node of individual shards, with the load seemingly rotating across shards at intervals.

      During these episodes:
        • A specific internal query (provided below) is executed extremely frequently.
        • WiredTiger cache read activity spikes to ~8GB/second.
        • A noticeable increase is observed in scanned and moved objects.
        • Despite the write-heavy nature of our workload, the load is focused on reads on primaries, which is unexpected.

      Query:
      ======

      {"t":\{"$date":"2025-06-11T11:51:20.537+00:00"}

      ,"s":"I",  "c":"COMMAND",  "id":51803,   "ctx":"conn1355389","msg":"Slow query","attr":{"type":"command","ns":"smt.system.buckets.userEvents","command":{"aggregate":"system.buckets.userEvents","pipeline":[{"$match":{"$and":[

      {"control.version":1}

      ,{"$or":[{"control.closed":{"$exists":false}},\{"control.closed":false}]},{"$and":[{"control.min.evt":{"$lte":

      {"$date":"2025-06-11T05:50:07.000Z"}

      }},{"control.min.evt":{"$gt":

      {"$date":"2025-05-12T05:50:07.000Z"}

      }}]},{"meta":{"ev":183,"uid":3336397}},{"data.evt.999":{"$exists":false}}]}},{"$set":{"object_size":

      {"$bsonSize":"$$ROOT"}

      }},{"$match":{"object_size":

      {"$lt":5120}

      }},{"$unset":"object_size"},{"$limit":1}],"cursor":{"batchSize":101},"hint":{"$hint":"nc_meta_1_evt_1"},"$db":"smt"},"planSummary":"IXSCAN { meta: 1, control.min.evt: 1, control.max.evt: 1 }","planningTimeMicros":208,"keysExamined":10192,"docsExamined":10192,"cursorExhausted":true,"numYields":32,"nreturned":1,"queryHash":"DA02A786","planCacheKey":"7E1927DA","queryFramework":"classic","reslen":4509,"locks":{"FeatureCompatibilityVersion":{"acquireCount":{"r":35}},"ReplicationStateTransition":{"acquireCount":{"w":1}},"Global":{"acquireCount":{"r":35}},"Mutex":{"acquireCount":

      {"r":5}

      }},"flowControl":{"acquireCount":3},"storage":{"data":{"bytesRead":120756256,"timeReadingMicros":18962}},"cpuNanos":189624673,"remote":"10.12.124.237:1988","protocol":"op_msg","durationMillis":1208}}

      Key Observations:

        • The query operates on the system.buckets collection for time-series data.
        • This is being triggered without any explicit read traffic from the application.
        • Appears to be internal or system-driven, possibly for bucket maintenance, validation, or reopening.
        • The query is consistent across shards but shifts periodically from one shard to another.

      Request for Clarification / Assistance:

        1.  What internal mechanism is triggering this type of read query in such frequency, especially in a write-intensive workload?
        2.  Is this expected behavior in time-series collections for MongoDB v7.0.18?
        3.  Could this be related to bucket reopening, TTL cleanup, or metadata compaction?
        4.  Is this a known issue or regression that might have been introduced in this version?
        5.  Can this behavior be mitigated or tuned (e.g., via bucket settings, index strategy, or hidden parameters)?

      This is currently impacting system performance, and we would appreciate any guidance or confirmation on whether a bug report is warranted or further diagnostics are needed.

        1. CPU.png
          CPU.png
          56 kB
        2. Cursors.png
          Cursors.png
          42 kB
        3. Load.png
          Load.png
          24 kB
        4. Scanned Objects.png
          Scanned Objects.png
          40 kB
        5. WiredTigerCacheActivity.png
          WiredTigerCacheActivity.png
          40 kB
        6. WiredTigerCacheEviction.png
          WiredTigerCacheEviction.png
          31 kB

            Assignee:
            Unassigned
            Reporter:
            Ajithkumar N
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved: