|
During this week and the past one, we have investigated different approaches for building the sharded data distribution.
- The first approach is creating a new command ('getShardedDataDistribution'). This command should obtain the list of all shared collections querying 'config.collection'. Then, for each collection, mongos will run '$collStats' aggregation for each shard to obtain the objective data as documents. We have temporary discarded this approach as we have other options (aggregation framework) that facilitates coding the output cursor of these sharded data documents.
- The second approach is editing the actual aggregation stage 'collStats'. The objective of this implementation is to not create a new addition command / aggregation stage. To do so, it is needed to modify the condition of 'collStats' of being the first stage of the aggregation pipeline and being independent of the collection. In the process of implementing this approach, we have found that we were not able to change the namespace before executing 'collStats' for each namespace given by the input of the stage before.
- The third and last approach is creating a new aggregation stage. This new stage will consult the sharded collections and, for each one of them, it will execute a new pipeline with a 'collStats' and a 'project'. In this approach we are able to create a new expression context with the correct namespace before calling 'collStats'.
|