[SERVER-51117] report index information on a per-shard basis via getIndexes or similar command Created: 23/Sep/20 Updated: 27/Oct/23 Resolved: 15/Jun/23 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Index Maintenance, Sharding |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Vinicius Grippa | Assignee: | Garaudy Etienne |
| Resolution: | Works as Designed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||
| Sprint: | Sharding 2022-03-07, Sharding NYC 2022-03-21, Sharding NYC 2022-05-30, Sharding 2022-06-27 | ||||||||||||||||||||
| Participants: | |||||||||||||||||||||
| Description |
| Comments |
| Comment by Garaudy Etienne [ 16/Jun/23 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
$indexStats will broadcast to all shards owning at least one chunk and get their indexes. Sharding the collection on {_id: 1} only created a single chunk [MinKey, MaxKey] on one of the shards. had you used {_id: "hashed"} to pre-split and create chunks on both shards, then $indexStats would have read from both shards. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Garaudy Etienne [ 23/Sep/22 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
eric.sedor@mongodb.com The reason for the behavior you observed is that the other shard doesn't own a chunk so $indexStats never gets run on that shard because the router knows there's no data on that shard. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Eric Sedor [ 10/Jun/21 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
$indexStats does include a shard field, but in my experience is it only actually includes results from the primary shard for the collection's database. The following reproduction steps for creating differing indexes on two shards shows that $indexStats only ultimately reports results for shard01.
then with mongo --port 27017
then with mongo --port 27018:
then with mongo --port 27019:
then back to mongo --port 27017:
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Garaudy Etienne [ 01/Feb/21 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
$indexStats has this information | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Garaudy Etienne [ 01/Dec/20 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I believe that $indexStats may already provide this information
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Vinicius Grippa [ 24/Sep/20 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
On a per-node(or per-shard) basis would be great to have. Because as you can see, the index may exist or not in all shards (even in replica sets this information is not consistent).
Thanks for that. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Eric Sedor [ 24/Sep/20 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Got it. Thank you. I will pass this ticket on as an improvement request. It may even be useful to have this information obtainable from a single command on a per-node basis. As a workaround to calling sh.status() on your own: The primary shard is defined on a per-database basis, and you can obtain the information programmatically from the config database's databases collection. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Vinicius Grippa [ 24/Sep/20 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I meant the primary shard for the sharded database, not the primary shard of the replica set. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Eric Sedor [ 24/Sep/20 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Thanks vgrippa@gmail.com. When you say "Primary Shard server' are you refering to: The current replica set Primary for a given shard? I have been assuming you meant the latter, but want to be sure. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Vinicius Grippa [ 24/Sep/20 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hi Eric,
Thanks for replying. Terabyte databases even creating the index in the background bring additional load and locks (even if it is a shared lock). Also, background indexes take a considerable amount of time compared to the foreground operation. To avoid this issue I perform a rolling index creation (secondary -> stepdown() -> old primary). However, in an environment with a lot of collections, I need to keep querying sh.status to identify the Primary Shard server. It would be beneficial to have the index information even if the index is not created on the Primary shard. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Eric Sedor [ 24/Sep/20 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
vgrippa@gmail.com, in a sharded environment it's expected that you will perform operations like index creation through mongos routers. If it's necessary for you to manage indexes on a per-shard basis, can you describe the use-case that requires you to do so? |