[SERVER-39326] mongo shard freezes for any command/ select when normal insert is going on Created: 01/Feb/19 Updated: 08/Mar/19 Resolved: 08/Mar/19 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | 3.4.0 |
| Fix Version/s: | None |
| Type: | Task | Priority: | Major - P3 |
| Reporter: | ASHISH PRASAD | Assignee: | Eric Sedor |
| Resolution: | Incomplete | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Participants: |
| Description |
|
We have a setup of 3 shards with each shard having 3 replica . We also recently migrated from Ubuntu (Ubuntu 16.04.5 LTS) to Centos (CentOS Linux release 7.6.1810) our mongo version is MongoDB shell version v3.4.0
Problem : shards on a normal workload perform well , There is a special case when we get a batch updates/ inserts shards completely freezes . All the diagnostic logs is frozen. As soon as the updates/inserts completed (which takes a while) shards comes back with normal response. Also any find query that are queued up runs after that (having no corelation to the database that the updates were being done) If I look at mong.log , during the time when shard is frozen there is almost not log that is written. As soon it is freed up I see all info log of all the query ran and each query taking about 10x times than average.
I believe this is an environmental issue of running mongo on CentOs and we are probably hitting the issue (This issue was never seen while running on Ubuntu)
Also to mention we choose to use the same git version of mongo (same release)
I am not sure what trace of db.setLogLevel() will show us where mongo is getting stuck I have tried turning on various trace but the log tracing is frozen for the time when mongo is doing that batch update and I don't get much as what is going on during that time
I would appreciate if Any help / guidance can be given to see of any other forms of tracing can be done to identify the problem
|
| Comments |
| Comment by Eric Sedor [ 08/Mar/19 ] |
|
Hi, We haven’t heard back from you for some time, so I’m going to mark this ticket as resolved. If this is still an issue for you, please provide additional information and we will reopen the ticket. Regards, |
| Comment by Eric Sedor [ 19/Feb/19 ] |
|
Hello, we still need additional information to diagnose the problem. If this is still an issue for you, would you please provide the information requested above? |
| Comment by Eric Sedor [ 01/Feb/19 ] |
|
Hello, would you please archive (tar or zip) the $dbpath/diagnostic.data directory of an affected node wand attach it to this ticket. Please also let us know some specific timestamps of incidents you've observed. We'd be happy to take a look. |