|
I was reviewing the design document for an upcoming project "Add and expose metrics to make shard key selection easier", when I noticed we use the term "cardinality" to mean "number of unique values":
https://www.mongodb.com/docs/manual/core/sharding-choose-a-shard-key/#std-label-shard-key-range
This is mostly fine, and somewhat consistent with set theory (as in the wiki page), but are the shard key values a set? I would think they are more like a multi-set or vector, since values repeat.
We are underway in making a new query optimizer which will estimate the "cardinality" of different query plan sub-segments. In that context, the "cardinality" will mean "number of values", not "number of unique values". This is all going to get confusing I think.
I'm open to suggestions here but I would propose the "choose a shard key" page would use "number of distinct values" instead of cardinality.
|