Loading...

Type: Bug
Resolution: Unresolved
Priority: Major - P3
Fix Version/s: None
Affects Version/s: 5.0.7
Component/s: None
Labels:

Assigned Teams:

Query Integration
Operating System:
ALL
Steps To Reproduce:
Hide

My setup:

Everything is running locally

Docker Desktop

Mongo 5.0.7

2 shards, with 3 servers in a replica set per shard

1 mongos

1 config replica set with 3 servers

Load tester application that pushes a lot of time series data, while simultaneously doing reads, written using the .NET C# driver, v. 2.15.0

Reproduction steps:

Setup a 2 shard cluster, even in docker

Create a sharded time series collection, and insert few records. I added a few million records. The destination bucket of each record doesn't make a difference, meaning the meta field can vary i.e. different sensor IDs.

Perform a sustained, heavy read load on the collection.

Observe that the primary shard for the time series view processes a lot of get queries, while the other shard processes none. This can be seen in the mongostat output attached, where shard1 on the left processes 1k+ "query" ops, but shard2 on the right processes none of these "query" ops

Observe the following logs entry on mongos

"ctx":"conn1516","msg":"Unable to establish remote cursors","attr":{"error":{"code":169,"codeName":"CommandOnShardedViewNotSupportedOnMongod","errmsg":"Resolved views on sharded collections must be executed by mongos","resolvedView":{"ns":"EndpointDataProtoTests.system.buckets.EndpointData:Endpoints-NormalV8","pipeline":[{"$_internalUnpackBucket":{"timeField":"t","metaField":"m","bucketMaxSpanSeconds":3600,"exclude":[]}}],"collation":{"locale":"simple"}}},"nRemotes":0}}

The attached image is a mongostat output that was captured by NoSqlBooster 7.1. The left side is a direct connection to the primary of shard1, and the right a direct connection to the primary of shard2
Show
My setup: Everything is running locally Docker Desktop Mongo 5.0.7 2 shards, with 3 servers in a replica set per shard 1 mongos 1 config replica set with 3 servers Load tester application that pushes a lot of time series data, while simultaneously doing reads, written using the .NET C# driver, v. 2.15.0 Reproduction steps: Setup a 2 shard cluster, even in docker Create a sharded time series collection, and insert few records. I added a few million records. The destination bucket of each record doesn't make a difference, meaning the meta field can vary i.e. different sensor IDs. Perform a sustained, heavy read load on the collection. Observe that the primary shard for the time series view processes a lot of get queries, while the other shard processes none. This can be seen in the mongostat output attached, where shard1 on the left processes 1k+ "query" ops, but shard2 on the right processes none of these "query" ops Observe the following logs entry on mongos "ctx" : "conn1516" , "msg" : "Unable to establish remote cursors" , "attr" :{ "error" :{ "code" :169, "codeName" : "CommandOnShardedViewNotSupportedOnMongod" , "errmsg" : "Resolved views on sharded collections must be executed by mongos" , "resolvedView" :{ "ns" : "EndpointDataProtoTests.system.buckets.EndpointData:Endpoints-NormalV8" , "pipeline" :[{ "$_internalUnpackBucket" :{ "timeField" : "t" , "metaField" : "m" , "bucketMaxSpanSeconds" :3600, "exclude" :[]}}], "collation" :{ "locale" : "simple" }}}, "nRemotes" :0}} The attached image is a mongostat output that was captured by NoSqlBooster 7.1. The left side is a direct connection to the primary of shard1, and the right a direct connection to the primary of shard2
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

Sharding a time series collection leads to higher throughput on write, but reads affect the whole cluster, because the primary shard has a spike in CPU usage. When reviewing the logs of Mongos, several log entries state that Resolved views on sharded collections must be executed by mongos. When I stop the read load, these messages are no longer logged.

From my research, it seems like this can be related to ~~SERVER-43376~~ - Operations on non-sharded views in sharded clusters extra round trip

This is a problem for us, because adding a read load affects the whole cluster's performance. Our workload has about 25% reads for every 100% of writes.

I found the problem while load testing my sharded time series prototype on Atlas