[SERVER-51117] report index information on a per-shard basis via getIndexes or similar command Created: 23/Sep/20  Updated: 27/Oct/23  Resolved: 15/Jun/23

Status: Closed
Project: Core Server
Component/s: Index Maintenance, Sharding
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Vinicius Grippa Assignee: Garaudy Etienne
Resolution: Works as Designed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
is duplicated by SERVER-57622 $indexStats only reports primary shar... Closed
Related
related to SERVER-34633 Allow $currentOp to retrieve operatio... Closed
is related to SERVER-45032 Allow $planCacheStats to target every... Closed
Operating System: ALL
Sprint: Sharding 2022-03-07, Sharding NYC 2022-03-21, Sharding NYC 2022-05-30, Sharding 2022-06-27
Participants:

 Description   

In a sharded cluster environment, getIndexes() reports indexes as defined on the database's Primary shard's primary node. It could be useful to have a convenient way of obtaining indexes on a per-shard (or even per-node given getIndexes on a primary reports indexes on that node) basis.

original

When creating an index on a shard that does not have the Primary role, mongoS does not find the index.

For example:

# Shard information
        {  "_id" : "vinnie""primary" : "shard02""partitioned" : true
            "version" : {  "uuid" : UUID("2ff1e7eb-0a75-4bde-ac8f-bff4e93dfc4a"),  "lastMod" : 1 } }
"_id" : "vinnie""primary" : "shard02""partitioned" : true
            "version" : {  "uuid" : UUID("2ff1e7eb-0a75-4bde-ac8f-bff4e93dfc4a"),  "lastMod" : 1 } }                
vinnie.companies
                        shard key: { "_id" : 1 }
                        unique: false
                        balancing: true
                        chunks:
                                shard01 7
                                shard02 8
                                shard03 7

Create the index on shard01:

shard01:PRIMARY> use vinnie;
switched to db vinnie
shard01:PRIMARY> db.companies.createIndex({firm: 1}){ "createdCollectionAutomatically" : false, "numIndexesBefore" : 1, "numIndexesAfter" : 2, "ok" : 1,

The result on mongoS:

mongos> db.companies.getIndexes()mongos> db.companies.getIndexes()
[ { "v" : 2, "key" : { "_id" : 1 }, "name" : "_id_", "ns" : "vinnie.companies" }]

Now, if I create the index on the Primary shard (shard02), mongoS recognizes:

mongos> db.companies.getIndexes()mongos> db.companies.getIndexes()
[ { "v" : 2, "key" : { "_id" : 1 }, "name" : "_id_", "ns" : "vinnie.companies" }, { "v" : 2, "key" : { "firm" : 1 }, "name" : "firm_1", "ns" : "vinnie.companies" }]

 
The proposed solution is to recognize the index independent of the shard role.



 Comments   
Comment by Garaudy Etienne [ 16/Jun/23 ]

$indexStats will broadcast to all shards owning at least one chunk and get their indexes. Sharding the collection on {_id: 1} only created a single chunk [MinKey, MaxKey] on one of the shards. had you used {_id: "hashed"} to pre-split and create chunks on both shards, then $indexStats would have read from both shards.

Comment by Garaudy Etienne [ 23/Sep/22 ]

eric.sedor@mongodb.com The reason for the behavior you observed is that the other shard doesn't own a chunk so $indexStats never gets run on that shard because the router knows there's no data on that shard.

Comment by Eric Sedor [ 10/Jun/21 ]

$indexStats does include a shard field, but in my experience is it only actually includes results from the primary shard for the collection's database.

The following reproduction steps for creating differing indexes on two shards shows that $indexStats only ultimately reports results for shard01.

mlaunch --dir clusters/2shard --sharded 2 --replicaset --nodes 1

then with mongo --port 27017

sh.enableSharding("test")
sh.shardCollection("test.test",{_id:1})
db.test.insert({})
use test
db.test.insert({})
exit

then with mongo --port 27018:

use test
db.test.createIndex({a:1})
exit

then with mongo --port 27019:

use test
db.test.createIndex({b:1})
exit

then back to mongo --port 27017:

mongos> use test
switched to db test
mongos> db.test.aggregate([{$indexStats:{}}]).toArray()
[
	{
		"name" : "_id_",
		"key" : {
			"_id" : 1
		},
		"host" : "nodachi:27018",
		"accesses" : {
			"ops" : NumberLong(0),
			"since" : ISODate("2021-06-10T19:14:20.504Z")
		},
		"shard" : "shard01",
		"spec" : {
			"v" : 2,
			"key" : {
				"_id" : 1
			},
			"name" : "_id_"
		}
	},
	{
		"name" : "a_1",
		"key" : {
			"a" : 1
		},
		"host" : "nodachi:27018",
		"accesses" : {
			"ops" : NumberLong(0),
			"since" : ISODate("2021-06-10T19:15:55.087Z")
		},
		"shard" : "shard01",
		"spec" : {
			"v" : 2,
			"key" : {
				"a" : 1
			},
			"name" : "a_1"
		}
	}
]

Comment by Garaudy Etienne [ 01/Feb/21 ]

$indexStats has this information

Comment by Garaudy Etienne [ 01/Dec/20 ]

I believe that $indexStats may already provide this information

 

Comment by Vinicius Grippa [ 24/Sep/20 ]

On a per-node(or per-shard) basis would be great to have. Because as you can see, the index may exist or not in all shards (even in replica sets this information is not consistent).

 

Thanks for that.

Comment by Eric Sedor [ 24/Sep/20 ]

Got it. Thank you. I will pass this ticket on as an improvement request. It may even be useful to have this information obtainable from a single command on a per-node basis.

As a workaround to calling sh.status() on your own:

The primary shard is defined on a per-database basis, and you can obtain the information programmatically from the config database's databases collection.

Comment by Vinicius Grippa [ 24/Sep/20 ]

I meant the primary shard for the sharded database, not the primary shard of the replica set.

Comment by Eric Sedor [ 24/Sep/20 ]

Thanks vgrippa@gmail.com. When you say "Primary Shard server' are you refering to:

The current replica set Primary for a given shard?
The primary shard for the sharded database?

I have been assuming you meant the latter, but want to be sure.

Comment by Vinicius Grippa [ 24/Sep/20 ]

Hi Eric,

 

Thanks for replying. Terabyte databases even creating the index in the background bring additional load and locks (even if it is a shared lock). Also, background indexes take a considerable amount of time compared to the foreground operation.

To avoid this issue I perform a rolling index creation (secondary -> stepdown() -> old primary).  However, in an environment with a lot of collections, I need to keep querying sh.status to identify the Primary Shard server.

It would be beneficial to have the index information even if the index is not created on the Primary shard.

Comment by Eric Sedor [ 24/Sep/20 ]

vgrippa@gmail.com, in a sharded environment it's expected that you will perform operations like index creation through mongos routers. If it's necessary for you to manage indexes on a per-shard basis, can you describe the use-case that requires you to do so?

Generated at Thu Feb 08 05:24:33 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.