[SERVER-6342] indexes in sharding configuration Created: 07/Jul/12  Updated: 15/Aug/12  Resolved: 11/Jul/12

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Azat Khuzhin Assignee: Greg Studer
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

version
2.1.3-pre-
f662d00f1331c85aba533b58b3714611beb9f415


Attachments: File shard_indexes.js    
Issue Links:
Depends
depends on SERVER-6403 sharded indexes are queried via the d... Closed
Operating System: ALL
Participants:

 Description   

I have two shards: "shard0000", "shard0001"

And I move collection from main shard ("shard0000") to "shard0001"
Than I create index for this collection ("value.count"), but I can see index for this collection only on "shard0001"
If I connect to "shard0000" or "mongos" there is index only by "_id" for this collection



 Comments   
Comment by Greg Studer [ 11/Jul/12 ]

Resolved as duplicate for better triaging, thanks for your help tracking this down!

Linked issue has a new test which reproduces - it's easier to trigger with multiple mongoses as the original mongos will "remember" that the collection started on a particular shard.

There are a few workarounds for now - 1) use coll.stats(), which is more accurate, or 2) create the index using the same mongos as was used to do the collection setup - this should prevent the problem from happening at least initially.

Comment by Azat Khuzhin [ 11/Jul/12 ]

"flushRouterConfig" didn't helps
I trying to restart "mongos" also didn't helps

Comment by Greg Studer [ 10/Jul/12 ]

Sorry, you were correct earlier - the code recently changed, index writes go to all shards. Still not able to reproduce on the version you've specified, however. Does the problem go away if you run flushRouterConfig on the mongos before ensureIndex - wonder if this is a known issue with commands and config refreshing.

> Why no data will be collected?
I think that using this query it must move all existed documents to another shard, or no?

The find part of a moveChunk is just a single "location", not a query, that identifies which chunk should be moved. If you know the full bounds, you can just specify the low-bound.

> Also are you sure that "mongod" detect indexes right in this situation, and will not create duplicate indexes?
Mongod will not create duplicate indexes - trying to create another index in any way on a mongod that's a duplicate is a no-op.

Comment by Azat Khuzhin [ 10/Jul/12 ]

About what's happening here:

I thought in the same direction.
But I think that "getIndexes()" (and others, like "getIndexKeys()"), that execute from "mongos" not "mongod", command must contain all indexes (from all shards)

Because some mongodb drivers, can not supporting "ensureIndex" command, just "createIndex", and must check manually.
Also are you sure that "mongod" detect indexes right in this situation, and will not create duplicate indexes?

Comment by Azat Khuzhin [ 10/Jul/12 ]

> Not sure I fully understand - but the second parameter here is the "find" parameter in the moveChunk command - it won't create a field it's just used as a selector for the chunk. Since you only have one chunk, -Inf to Inf, any value of _id will match that chunk, and no data will be collected.

Why no data will be collected?
I think that using this query it must move all existed documents to another shard, or no?

Comment by Greg Studer [ 10/Jul/12 ]

Oh right, just realized what's happening here -

When you run ensureIndex, it's targeted only to shards which contain chunks. Since all the data for your collection, once moved, is on shard0001, an index is only created on shard0001. This isn't a problem, however, since when a chunks is moved to a shard, it is ensured that all indexes on the old shard are recreated on the new shard.

If you create the index prior to moving the chunk, you'll end up with indexes on both shards.

Comment by Greg Studer [ 10/Jul/12 ]

Will see if getIndexes() is displaying something incorrectly.

> I don't want to add separate field for this, so because of this I use "exists"
Is queries to this collection will handle more slowly because of this?

Not sure I fully understand - but the second parameter here is the "find" parameter in the moveChunk command - it won't create a field it's just used as a selector for the chunk. Since you only have one chunk, -Inf to Inf, any value of _id will match that chunk, and no data will be collected.

Comment by Azat Khuzhin [ 10/Jul/12 ]

Also in your script you don't run 'disableBalancing', but this is insignificantly

And I think, I git this, in your script you run's 'db.foobar.stats()' while I run 'db.foobar.getIndexes()'
This two commands have different output

mongos> db.foobar.getIndexes()
[
        {
                "v" : 1,
                "key" : {
                        "_id" : 1
                },
                "ns" : "my_db.foobar",
                "name" : "_id_"
        }
]
mongos> db.foobar.stats()
{
        "sharded" : true,
        "ns" : "my_db.foobar",
        "count" : 152971,
        "numExtents" : 7,
        "size" : 12769396,
        "storageSize" : 22507520,
        "totalIndexSize" : 10391696,
        "indexSizes" : {
                "_id_" : 6524448,
                "value_count_-1" : 3867248
        },
        "avgObjSize" : 83.47592680965673,
        "nindexes" : 2,
        "nchunks" : 2,
        "shards" : {
                "shard0001" : {
                        "ns" : "my_db.foobar",
                        "count" : 152971,
                        "size" : 12769396,
                        "avgObjSize" : 83.47592680965673,
                        "storageSize" : 22507520,
                        "numExtents" : 7,
                        "nindexes" : 2,
                        "lastExtentSize" : 11325440,
                        "paddingFactor" : 1,
                        "systemFlags" : 1,
                        "userFlags" : 0,
                        "totalIndexSize" : 10391696,
                        "indexSizes" : {
                                "_id_" : 6524448,
                                "value_count_-1" : 3867248
                        },
                        "ok" : 1
                }
        },
        "ok" : 1
}
mongos>

Comment by Azat Khuzhin [ 10/Jul/12 ]

I don't want to add separate field for this, so because of this I use "exists"
Is queries to this collection will handle more slowly because of this?

Comment by Greg Studer [ 09/Jul/12 ]

Also, in the script you've posted, I don't think you want to use : {_id: {$exists: 1}} - the query operator is interpreted as a portion of the query range. Assuming the collection isn't split, any constant value will work here.

Comment by Greg Studer [ 09/Jul/12 ]

I can't seem to reproduce this problem with the git hash you posted - attached a script above that creates a collection, shards and moves it, and creates an index.

Are you issuing the "ensureIndex" command against just one shard?

Comment by Azat Khuzhin [ 08/Jul/12 ]

Using this function

// Create collection
// Move to dstShard
// And mark {noBalance: true}
// 
// but before you must enable sharding for collection db!
function(dbName, collectionName, dstShard) {
	var result;
 
	result = db.getMongo().getDB(dbName).createCollection(collectionName);
	if (!result.ok) {
		printjson(result);
		return;
	}
 
	var fullCollectionName = dbName + '.' + collectionName;
 
	sh.shardCollection(fullCollectionName, {_id: 1});
	result = sh.moveChunk(fullCollectionName, {_id: {$exists: 1}}, dstShard);
	if (!result.ok) {
		printjson(result);
		return;
	}
	sh.disableBalancing(fullCollectionName);
}

Create index using "ensureIndex" function
Connect to mongos

Comment by Scott Hernandez (Inactive) [ 08/Jul/12 ]

How did you move the collection? How did you create the index? Was it through mongos, or directly connected to the shard, and if direct which shard?

Generated at Thu Feb 08 03:11:24 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.