[SERVER-71158] getShardDistribution does not properly support sharded timeseries collections Created: 08/Nov/22  Updated: 27/Oct/23  Resolved: 10/May/23

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 6.0.2
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Dmitry Ryabtsev Assignee: Antonio Fuschetto
Resolution: Works as Designed Votes: 0
Labels: sharding-wfbf-day
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
depends on MONGOSH-1447 getShardDistribution does not support... Open
depends on SERVER-76976 Programmatic API to get information o... Backlog
Assigned Teams:
Sharding EMEA
Operating System: ALL
Steps To Reproduce:

1. Create a shareded timeseries collection
2. Invoke getShardedDistribution() against the namespace - this will return "Collection ts is not sharded"
3. Invoke getShardedDistribution() against the corresponding system.buckets.ts collection - this will yield NaN for docs per chunk and docs (total)

Sprint: Sharding EMEA 2023-04-17, Sharding EMEA 2023-05-01, Sharding EMEA 2023-05-15
Participants:
Case:

 Description   

As of v6.0.2 the getShardDistribution() is not able to recognize a sharded timeseries collection:

[direct: mongos] test> db.weather.getShardDistribution()
MongoshInvalidInputError: [SHAPI-10001] Collection weather is not sharded

If invoked against the buckets collection, this will show some details, but a number of fields yield NaN:

[direct: mongos] test> db.system.buckets.weather.getShardDistribution()
Shard shard01 at shard01/localhost:27018
{
  data: '718B',
  docs: undefined,
  chunks: 1,
  'estimated data per chunk': '718B',
  'estimated docs per chunk': NaN
}
---
Totals
{
  data: '718B',
  docs: NaN,
  chunks: 1,
  'Shard shard01': [ '100 % data', 'NaN % docs in cluster', '0B avg obj size on shard' ]
}

This appears to be an oversight as the expectation is that getShardDistribution() should be able to recognize a sharded timeseries collection.



 Comments   
Comment by Antonio Fuschetto [ 10/May/23 ]

The command is implemented in the Mongo Shell and requires a fix at that level (see MONGOSH-1447). In the future, the Sharding team will expose a programmatic API to decuple the implementation details of system collections from these commands' implementations.

Comment by Antonio Fuschetto [ 10/May/23 ]

As we discussed internally, the getShardDistribution the command should be fixed 1) retrieving the metadata of the collection that is actually sharded (the bucket collection in the case of time series) and presenting the shard distribution correctly (refer to MONGOSH-1447 for details).

Today this command, like others, is implemented by accessing directly to system collections, implying awareness of server's implementation details that could change from one version to another. The idea for the future is to decouple these command implementations from the internal representation of the system collections by exposing a programmatic API. Such API can be used to retrieve information on sharded collections, ensuring support and consistency regardless the internal representation of this information (refer to SERVER-76976 for details).

Generated at Thu Feb 08 06:18:13 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.