Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-65949

Sharded time series leads to high query rates on the primary shard with CommandOnShardedViewNotSupportedOnMongod errors

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Backlog
    • Major - P3
    • Resolution: Unresolved
    • 5.0.7
    • None
    • None
    • Query Integration
    • ALL
    • Hide

      My setup:

      • Everything is running locally
      • Docker Desktop
      • Mongo 5.0.7
      • 2 shards, with 3 servers in a replica set per shard
      • 1 mongos
      • 1 config replica set with 3 servers
      • Load tester application that pushes a lot of time series data, while simultaneously doing reads, written using the .NET C# driver, v. 2.15.0

      Reproduction steps:

      1. Setup a 2 shard cluster, even in docker
      2. Create a sharded time series collection, and insert few records.  I added a few million records.  The destination bucket of each record doesn't make a difference, meaning the meta field can vary i.e. different sensor IDs.
      3. Perform a sustained, heavy read load on the collection.
      4. Observe that the primary shard for the time series view processes a lot of get queries, while the other shard processes none.  This can be seen in the mongostat output attached, where shard1 on the left processes 1k+ "query" ops, but shard2 on the right processes none of these "query" ops
      5. Observe the following logs entry on mongos
        1. "ctx":"conn1516","msg":"Unable to establish remote cursors","attr":{"error":{"code":169,"codeName":"CommandOnShardedViewNotSupportedOnMongod","errmsg":"Resolved views on sharded collections must be executed by mongos","resolvedView":{"ns":"EndpointDataProtoTests.system.buckets.EndpointData:Endpoints-NormalV8","pipeline":[{"$_internalUnpackBucket":{"timeField":"t","metaField":"m","bucketMaxSpanSeconds":3600,"exclude":[]}}],"collation":{"locale":"simple"}}},"nRemotes":0}}

      The attached image is a mongostat output that was captured by NoSqlBooster 7.1.  The left side is a direct connection to the primary of shard1, and the right a direct connection to the primary of shard2

      Show
      My setup: Everything is running locally Docker Desktop Mongo 5.0.7 2 shards, with 3 servers in a replica set per shard 1 mongos 1 config replica set with 3 servers Load tester application that pushes a lot of time series data, while simultaneously doing reads, written using the .NET C# driver, v. 2.15.0 Reproduction steps: Setup a 2 shard cluster, even in docker Create a sharded time series collection, and insert few records.  I added a few million records.  The destination bucket of each record doesn't make a difference, meaning the meta field can vary i.e. different sensor IDs. Perform a sustained, heavy read load on the collection. Observe that the primary shard for the time series view processes a lot of get queries, while the other shard processes none.  This can be seen in the mongostat output attached, where shard1 on the left processes 1k+ "query" ops, but shard2 on the right processes none of these "query" ops Observe the following logs entry on mongos "ctx" : "conn1516" , "msg" : "Unable to establish remote cursors" , "attr" :{ "error" :{ "code" : 169 , "codeName" : "CommandOnShardedViewNotSupportedOnMongod" , "errmsg" : "Resolved views on sharded collections must be executed by mongos" , "resolvedView" :{ "ns" : "EndpointDataProtoTests.system.buckets.EndpointData:Endpoints-NormalV8" , "pipeline" :[{ "$_internalUnpackBucket" :{ "timeField" : "t" , "metaField" : "m" , "bucketMaxSpanSeconds" : 3600 , "exclude" :[]}}], "collation" :{ "locale" : "simple" }}}, "nRemotes" : 0 }} The attached image is a mongostat output that was captured by NoSqlBooster 7.1.  The left side is a direct connection to the primary of shard1, and the right a direct connection to the primary of shard2

    Description

      Sharding a time series collection leads to higher throughput on write, but reads affect the whole cluster, because the primary shard has a spike in CPU usage.  When reviewing the logs of Mongos, several log entries state that Resolved views on sharded collections must be executed by mongos.  When I stop the read load, these messages are no longer logged.

      From my research, it seems like this can be related to SERVER-43376 - Operations on non-sharded views in sharded clusters extra round trip

      This is a problem for us, because adding a read load affects the whole cluster's performance.  Our workload has about 25% reads for every 100% of writes.

      I found the problem while load testing my sharded time series prototype on Atlas

      Attachments

        1. All shards mongostat.png
          All shards mongostat.png
          271 kB
        2. data and loadtester2.zip
          5.25 MB
        3. data - attempt2.zip
          2.07 MB
        4. image-2022-06-09-13-59-20-641.png
          image-2022-06-09-13-59-20-641.png
          92 kB
        5. image-2022-06-09-14-04-39-666.png
          image-2022-06-09-14-04-39-666.png
          102 kB
        6. image-2022-06-14-10-20-55-809.png
          image-2022-06-14-10-20-55-809.png
          421 kB
        7. Mongotop.png
          Mongotop.png
          173 kB
        8. Multi-shard Metrics overview 1.png
          Multi-shard Metrics overview 1.png
          472 kB
        9. Multi-shard Metrics overview 2.png
          Multi-shard Metrics overview 2.png
          204 kB
        10. Screenshot 2022-04-26 104816.png
          Screenshot 2022-04-26 104816.png
          206 kB
        11. shard1.png
          shard1.png
          77 kB
        12. shard1-t2.png
          shard1-t2.png
          150 kB
        13. shard1-t2-v2.png
          shard1-t2-v2.png
          179 kB
        14. shard2.png
          shard2.png
          62 kB
        15. shard2-t2.png
          shard2-t2.png
          166 kB
        16. Shard mongo-stat.png
          Shard mongo-stat.png
          35 kB

        Issue Links

          Activity

            People

              backlog-query-integration Backlog - Query Integration
              marnu.vdmerwe@iotnxt.com Marnu vd Merwe
              Votes:
              1 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

                Created:
                Updated: