[SERVER-15671] Sharded collection fails on "stats()" with "ns not found", but "find()" works. Created: 15/Oct/14  Updated: 24/Jan/15  Resolved: 23/Jan/15

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 2.4.8
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Oleg Rekutin Assignee: Unassigned
Resolution: Incomplete Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Operating System: ALL
Steps To Reproduce:

Running 2.4.8.

Participants:

 Description   

We have created a temporary collection for the purposes of map-reduce output. After running several map-reduce jobs that failed in the last moment due to a networking error, we wanted to inspect the stats on those collections.

About 30-50% of the collections fail on the "stats()" call but can be successfully iterated using "find()". I have also confirmed that the collections actually exist on every shard (by connecting to each shard's mongo and doing a stats() call there).

mongos> db.collection_nw1q.stats()
{
	"sharded" : true,
	"ok" : 0,
	"errmsg" : "failed on shard: { ok: 0.0, errmsg: \"ns not found\" }"
}
mongos> db.collection_nw1q.find({}, {_id: 1}).sort({_id: 1}).limit(20)
... actual data returned here, no error ...

There are no errors in mongos, even at logLevel: 1, when the failed stats() call returns.



 Comments   
Comment by Ramon Fernandez Marina [ 04/Dec/14 ]

If you manage to reliably reproduce the issue in the absence of networking/other problems please let us know. We'll need logs from the mongos you're using as well as from each primary node in each shard. Happy bug hunting!

Comment by Oleg Rekutin [ 04/Dec/14 ]

Hi Ramon, during my original reporting, I recall confirming on every shard that the collection exists. However, just now I checked every primary for each collection with this issue on every shard now and it looks like each time there is one or two shards missing the collection. So it is possible that I may have made a mistake when I reported that all shards had this collection.

At the very least, it should never happen that a collection doesn't make it to all the shards. I am going to see if there's a recent occurrence of this issue so I can see whether there were network issues or other issues at the time of collection creation/map reduce call.

Comment by Ramon Fernandez Marina [ 04/Dec/14 ]

Hi oleg@evergage.com, apologies for the late reply. I can reproduce this behavior when a given database does not exist in at least one shard, but exists in at least one shard. Is collection_nw1q one of those collections used as the output of a mapReduce job? If so, it is possible that the networking error prevented this collection from being written to one of your shards.

If you connect to each of your shards in turn and run

db.collection_nw1q.stats()

you'll get the "ns not found" error in those shards where this collection does not exist, and database stats where it does.

I think the network error you mention is likely to blame, but if this behavior persists in the absence of network errors please let us know so we can investigate if there's a bug in mapReduce.

Generated at Thu Feb 08 03:38:38 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.