[SERVER-49490] mongotop listed unexpected collections Created: 29/Apr/20 Updated: 12/Dec/23 |
|
| Status: | Backlog |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | 3.6.14 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Stefan Warten | Assignee: | Backlog - Cluster Scalability |
| Resolution: | Unresolved | Votes: | 2 |
| Labels: | sharding-wfbf-day | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Assigned Teams: |
Cluster Scalability
|
| Operating System: | ALL |
| Sprint: | Sharding 2020-09-21, Sharding 2020-10-05, Sharding 2020-10-19, Sharding 2020-11-02, Sharding 2020-11-16, Sharding 2020-11-30, Sharding 2020-12-14, Sharding 2020-12-28, Sharding 2021-01-11, Sharding 2021-01-25, Sharding 2021-02-22, Sharding 2021-03-08, Sharding 2021-03-22, Sharding 2021-04-05, Sharding 2021-04-19, Sharding 2021-05-03 |
| Participants: |
| Description |
|
When running mongotop (or top in mongo cli) against a mongod of a sharded cluster, it reports usage of certain collections in the repset that do not exist in it (but in other repsets in the sharded cluster). This is unexpected. I would expect to see only top usage of collections that are on the mongod (or of the repset it belongs to) mongotop is connected to. The foreign collections only appear sometimes but not once but also for longer periods (like some minutes) I see the collections in question not mentioned in mongod logs and, of course, they really do not exist in the repset. Do I have wrong expectations on how mongotop (and top) work or is this a bug? |
| Comments |
| Comment by Eric Sedor [ 24/Aug/20 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Thanks stefan.warten@researchgate.net, I wanted to confirm that for the collection "refind.publicationFigures" that appears in the top output from repset15, the config database does not show an obvious reason why that collection is visible. There are other examples but this one demonstrates the disagreement between top and config server metadata. The segment of top for that collection:
Then on the config database, I can see that everything appears in order. No chunks are on or seem to have been on repset15. The entire key range for the collection is tagged to "Refind" and repset15 does not have that tag.
I am going to pass this ticket on to an appropriate team for additional consideration. Sincerely, | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Stefan Warten [ 20/Aug/20 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hi Eric, I have uploaded SERVER-49490-2.tar.gz which contains the dump of config database of config servers before I run top on repset15 again, the full output of top and the latest mongod logs for the node. Output of top contains again several collections which don't match the `show dbs` output. Hope this helps to identify the issue. Regards, SW. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Eric Sedor [ 19/Aug/20 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hi SW, thanks. I am able to see some specific collections in top which don't match the show dbs output. With apology though, I should have asked specifically for a dump of the config database from a Config Server not from repset15. Can you help me get this information again for another case where top shows unexpected results? As an aside: When running top on a shard, it should be safe to ignore cache.chunks.* collections in the config database. Their presence is not necessarily an issue. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Stefan Warten [ 18/Aug/20 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hi Eric, I have uploaded SERVER-49490.tar.gz which contains the dump of config database before I run top, the full output of top and the mongod logs for the node. I guess you are going into the right direction because the config db has many entries prefixed with cache.chunks of collections that, from my point of view, do not belong there but they also appear in the top output. I have also included the output of `show dbs` for comparison that only shows what I really expect to be on that shard.
Regards, SW. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Eric Sedor [ 12/Aug/20 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hi stefan.warten@researchgate.net, One thing to keep in mind as we discuss this is that tags are applied to shards and chunks within collections. It is somewhat of a side-effect of this that causes sharded collections to not exist on shards that chunks don't migrate to. That said, we want to understand more about what you are reporting to investigate further. The next time this happens, can you provide:
I've created a secure upload portal for you. Files uploaded to this portal are visible only to MongoDB employees and are routinely deleted after some time. Gratefully, | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Stefan Warten [ 07/Aug/20 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Just wanted to let you know that we have recently upgraded our MongoDB clusters to version 4.0.13 and the problem still exists. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Tim Fogarty [ 14/Jul/20 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hi stefan.warten@researchgate.net, thanks for all the info and sorry for the delays. I'm going to move this ticket to the SERVER project. The Server team should be able to help you with this better than I can. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Stefan Warten [ 03/Jul/20 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hi Tim, thanks for your answer. We have a sharded cluster with 30 repsets and 7 shard tags. We monitor MongoDB instances with Diamond and mongodb-exporter for Prometheus. Both also collect metrics from top() and this is how we noticed that we have metrics of collections in the graphs that do not belong there. When I started to debug it, I noticed that also mongotop shows these collections sometimes. Example: Shortened sh.status() shows
mongotop (or top()) on repset15 (that has shard tag 'Foo') showed other.collection1 yesterday for 23 minutes, which is a sharded collection on repsets with shard tag Other and db.collection1.stats() shows (shortened)
mongotop on repset15 (that has shard tag 'Foo') showed stat.collection2 yesterday only once, which is a sharded collection on repsets with shard tag Stats and db.collection2.stats() shows (shortened)
These collections do not belong to repset15 in any way and should not have any shards on repset15 and if I log onto repset15 and try to access these collections directly (not through the mongos) they do not exist, of course. I could give also other examples from each of the other repsets. I haven't seen foreign unsharded collections in mongotop, only sharded collections that do not belong there. I am happy to provide more detailed information if I would know what you need. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Tim Fogarty [ 30/Jun/20 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hi stefan.warten@researchgate.net, sorry for the delay getting back to you here. I've tried to reproduce this but I've been unable to do so. Could you please explain a bit more about your setup? I'm assuming something like the following situation: You have shard A and shard B. Shard A is the primary shard for the unsharded collection foo. When you run top on shard B, you unexpectedly see activity for collection foo. Is this the same as your situation? |