[SERVER-27105] balancing error: Failed with error 'ns not found, should be impossible' Created: 18/Nov/16  Updated: 04/Dec/16  Resolved: 04/Dec/16

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 3.2.9
Fix Version/s: None

Type: Bug Priority: Minor - P4
Reporter: Tal Grynbaum Assignee: Kelsey Schubert
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
Operating System: ALL
Steps To Reproduce:

not sure,
start with 2 shards with lots of data and add another shard.

Participants:

 Description   

I had a mongodb with 2 shards, with a huge collection (14B rows).
I added another shard, and i'm getting the following errors when running sh.status()

active mongoses:
"3.2.9" : 1
balancer:
Currently enabled: yes
Currently running: yes
Balancer lock taken at Mon Oct 31 2016 07:08:03 GMT+0000 (UTC) by mshard1.XXXX:32232:1475250202:-218043174:Balancer:1442401236
Collections with active migrations:
triggerhood.newcube8 started at Mon Oct 31 2016 07:08:04 GMT+0000 (UTC)
Failed balancer rounds in last 5 attempts: 0
Migration Results for the last 24 hours:
989 : Success
1 : Failed with error 'aborted', from s1 to s3
1 : Failed with error 'aborted', from s1 to s2
476 : Failed with error 'chunk too big to move', from s2 to s3
3682 : Failed with error 'ns not found, should be impossible', from s1 to s3
504 : Failed with error 'chunk too big to move', from s1 to s3



 Comments   
Comment by Kelsey Schubert [ 04/Dec/16 ]

Hi talg70,

Thank you for confirming that this issue has been resolved.

Kind regards,
Thomas

Comment by Tal Grynbaum [ 04/Dec/16 ]

Hi Thomas

I was able to drop the collections with a drop command from mongos, and no, I haven't experienced the issue again.

Thanks
Tal

Comment by Kelsey Schubert [ 04/Dec/16 ]

Hi talg70,

Have you experienced this issue since following the steps outlined in SERVER-17397 to drop the affected collections? Please let us know.

Thank you,
Thomas

Comment by Kelsey Schubert [ 22/Nov/16 ]

Hi talg70,

Since these collections are not needed, I would recommend dropping them by following the workaround described in SERVER-17397.

Kind regards,
Thomas

Comment by Tal Grynbaum [ 20/Nov/16 ]

another strange behavior:

running db.testshard.find()
yields no results, but:

mongos> db.chunks.find({ns:"triggerhood.testshard"})
has planty (105)
{ "_id" : "triggerhood.testshard-_id_MinKey", "lastmod" : Timestamp(20, 0), "lastmodEpoch" : ObjectId("57ecf0d2dbbe07e7246afc5c"), "ns" : "triggerhood.testshard", "min" : { "_id" : { "$minKey" : 1 } }, "max" : { "_id" : NumberLong("-9073439221275772406") }, "shard" : "s2" }
{ "_id" : "triggerhood.testshard-_id_-9073439221275772406", "lastmod" : Timestamp(21, 0), "lastmodEpoch" : ObjectId("57ecf0d2dbbe07e7246afc5c"), "ns" : "triggerhood.testshard", "min" : { "_id" : NumberLong("-9073439221275772406") }, "max" : { "_id" : NumberLong("-8923105572753148489") }, "shard" : "s2" }
...

Comment by Tal Grynbaum [ 20/Nov/16 ]

Before attempting to do a manual moveChunk,
I ran distinct queries on the chunks in each of the shards:

mongos> db.chunks.distinct("ns")
[
	"triggerhood.cube8",
	"triggerhood.newDailyUsers",
	"triggerhood.newcube",
	"triggerhood.newcube8",
	"triggerhood.testshard",
	"triggerhood.testshard2"
]
mongos> db.chunks.distinct("ns",{shard:"s3"})
[ "triggerhood.newcube", "triggerhood.newcube8", "triggerhood.testshard2" ]
mongos> db.chunks.distinct("ns",{shard:"s2"})
[
	"triggerhood.newcube",
	"triggerhood.newcube8",
	"triggerhood.newDailyUsers",
	"triggerhood.testshard",
	"triggerhood.testshard2"
]
mongos> db.chunks.distinct("ns",{shard:"s1"})
[
	"triggerhood.newcube",
	"triggerhood.newcube8",
	"triggerhood.newDailyUsers",
	"triggerhood.testshard",
	"triggerhood.cube8",
	"triggerhood.testshard2"
]

so I suspected that the bad collection is newDailyUsers, and indeed, running the following command:

mongos> db.runCommand( { moveChunk :"triggerhood.newDailyUsers", find:{_id: "triggerhood.newDailyUsers-_id_ObjectId('57eb7613dbbe07e7240526eb')"},to:"s3"})
yield the following results:
 
{
	"cause" : {
		"ok" : 0,
		"errmsg" : "ns not found, should be impossible"
	},
	"ok" : 0,
	"errmsg" : "move failed"
}

and also on the testshard collection

db.runCommand( { moveChunk :"triggerhood.testshard", find:{_id: "triggerhood.testshard-_id_-8614707915317272308"},to:"s3"})
{
	"cause" : {
		"ok" : 0,
		"errmsg" : "ns not found, should be impossible"
	},
	"ok" : 0,
	"errmsg" : "move failed"
}

As I don't need these collections, should I drop them on the mongos or on each of the primaries?

Comment by Kelsey Schubert [ 18/Nov/16 ]

Hi talg70,

Thanks for opening this ticket. I've reviewed the previous discussion on google groups and agree that this is likely related to collection drops. Would you please attempt to manually move a chunk for one of the affected collections using the moveChunk command?

Thank you,
Thomas

Generated at Thu Feb 08 04:14:10 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.