[SERVER-34747] find() and findOne() hangs when invoked from mongos Created: 30/Apr/18 Updated: 16/Nov/21 Resolved: 23/May/18 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | 3.6.4 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Sasa Skevin | Assignee: | Kelsey Schubert |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Operating System: | ALL |
| Steps To Reproduce: | 1) Connect to mongos 2) select database 3) invoke findOne() (it hangs and does not return result)
The one that works: 1) Connect to primary of any of the shards 2) select database 3) invoke findOne() (result is immediately returned) |
| Participants: |
| Description |
|
A MongoDB v3.6.4 database on Debian 9.4 with a large sharded collection with two shards and WiredTiger engine hangs when issuing a findOne() or find() command without any parameters. The collection has only the default index on the _id key. If findOne() or find() is invoked on any of the shards then it works immediately. If it is invoked on a mongos then it hangs. With db.currentOp() this can be seen after findOne() is invoked:
|
| Comments |
| Comment by Kelsey Schubert [ 23/May/18 ] |
|
Thanks for the update! |
| Comment by Sasa Skevin [ 23/May/18 ] |
|
After two weeks, the cleanup finished and after that find() and findOne() works as expected. |
| Comment by Sasa Skevin [ 08/May/18 ] |
|
Hi @Kelsey, Well yes, we moved quite a lot of chunks in the last couple of weeks and it seems that the majority of them become orphans. We initiated orphan cleanup but according to current speed of cleanup it will take about two weeks before it is finished. I'll get back then. Thanks, Sasa
|
| Comment by Kelsey Schubert [ 08/May/18 ] |
|
Hi ssasa, Thanks for reporting this issue. First, I'd like to clarify that Debian 9 is not supported on 3.6.4. That said, my suspicion is that there are a number of orphan documents in your sharded cluster. As a result, a findOne() must iterate through some number of documents to find one that belongs to the shard that it is residing on. This process is delaying the first batch that returns from the mongod, resulting in the behavior that you describe. Would you please execute the cleanupOrphaned command against your shards, taking care to note the potential performance implications if running on a production system, and let us know if it resolves the issue? Thank you, |