[SERVER-18988] Terrible Query Performance Created: 16/Jun/15 Updated: 10/Mar/16 Resolved: 07/Mar/16 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Performance, WiredTiger |
| Affects Version/s: | 3.0.3 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Critical - P2 |
| Reporter: | JinHui | Assignee: | Bruce Lucas (Inactive) |
| Resolution: | Cannot Reproduce | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
| Operating System: | ALL |
| Participants: |
| Description |
|
We are facing to a terrible query performance issue. There are only 5 million rows in the table. And the we add in index on username, the query we run are always against the username. Both nscaned and nscannedObjects are 1, however it will take more than 100 ms. |
| Comments |
| Comment by carl dong [ 10/Mar/16 ] | |||||||||||||||
|
Sure , here is the log for 3 days . | |||||||||||||||
| Comment by Bruce Lucas (Inactive) [ 09/Mar/16 ] | |||||||||||||||
|
Hi Carl, Would you be able to also upload the mongod log showing the slow queries so that we can correlate them with events in the diagnostic data. Thanks, | |||||||||||||||
| Comment by carl dong [ 09/Mar/16 ] | |||||||||||||||
|
Hi Ramon, Tested Version 3.2.3 , this time I observe the issue happens everytime 3-4 seconds after database flush action, modify the flush interval from 60 to 20 seconds , the issue queries reduce but still exist. Thanks , Carl Dong | |||||||||||||||
| Comment by Ramon Fernandez Marina [ 07/Mar/16 ] | |||||||||||||||
|
Unfortunately I have no more information to provide at this point, sorry about that. I was not able to find the root cause with the provided data, nor was I able to reproduce this behavior on my end. Without any additional information that shows evidence of a bug in the server I'm afraid I have to close this ticket for the time being. As I mentioned above, I'd recommend testing MongoDB 3.2.3. If the behavior persists please post in this ticket and I'll be happy to reopen it. Regards, | |||||||||||||||
| Comment by JinHui [ 02/Mar/16 ] | |||||||||||||||
|
As you can see from Carl's screenshot, all of the slow query happened in batch. And most of the slow query are update or insert command, there is now key update for these query. Any thoughts? | |||||||||||||||
| Comment by carl dong [ 02/Feb/16 ] | |||||||||||||||
|
In fact , the issue frequently happens , as you see in the ops manager profiler page. | |||||||||||||||
| Comment by Ramon Fernandez Marina [ 01/Feb/16 ] | |||||||||||||||
|
Carl, since I haven't found a clear root cause I can't say for sure whether upgrading will help. Note however that MongoDB 3.2 makes WiredTiger the default storage engine, and a lot of work was put into making things faster and more stable – so if this turns out to be an unfortunate interaction with the storage engine then 3.2.1 (our latest stable release right now) should definitely help. MongoDB 3.2 also makes it easier to collect diagnostics data, as such data is recorded in the background and there's no need to wait until a given problem happens again. | |||||||||||||||
| Comment by carl dong [ 28/Jan/16 ] | |||||||||||||||
|
Hi Ramon, We are running on Mongodb-enterprise 3.0.7 , but issue still exists , even worse . . If diagnostic data would help in 3.2.0 version , we will plan a upgrade for that . Thanks , Carl Dong | |||||||||||||||
| Comment by Ramon Fernandez Marina [ 27/Jan/16 ] | |||||||||||||||
|
Hi cainiaoiswo, apologies for the long delay in getting back to you. We were not able to identify any concluding evidence of what may be causing the slow queries reported. The data you sent us didn't allow us to identify what resource may be being temporarily held and slowing the queries down. I'll take another look at the uploaded data to see if we missed something. Am I right to assume you're still experiencing this issue? Or has a recent upgrade helped speed queries up? Thanks, | |||||||||||||||
| Comment by JinHui [ 21/Oct/15 ] | |||||||||||||||
|
Hi Ramon, is there any good news for me? | |||||||||||||||
| Comment by Ramon Fernandez Marina [ 09/Sep/15 ] | |||||||||||||||
|
Apologies for the radio silence cainiaoiswo - yes, this issue is till open and is being investigated. The information collected so far is not conclusive as of yet; we'll post updates to this ticket as they happen. Regards, | |||||||||||||||
| Comment by JinHui [ 13/Aug/15 ] | |||||||||||||||
|
Please let me know if you are still working on this issue. | |||||||||||||||
| Comment by JinHui [ 31/Jul/15 ] | |||||||||||||||
|
Hi Bruce, We've upgraded our DB to 3.0.5 but the issue still exists. By the way, you'd better use multiple threads to reproduce the issue, The number of the slow query during peak time is hundreds time than other time. | |||||||||||||||
| Comment by JinHui [ 17/Jul/15 ] | |||||||||||||||
|
Hi Bruce, thanks for your update. I've attached more information in the thread. | |||||||||||||||
| Comment by Bruce Lucas (Inactive) [ 16/Jul/15 ] | |||||||||||||||
|
Hi JinHui, Thanks for providing that information, and sorry for the delay in responding. That data shows one brief incident, at 20:20:20, where a dozen write (insert and update operations) all completed within about 1ms of each other, the longest of them having taken 640ms, suggesting that they had been held up accessing some common resource and then all completed quickly when that resource became available. It affects a couple of different collections, and as you have noted the issue often affects more than one database at the same time, suggesting that the common resource is in some lower layer of the system, such as the storage engine or storage itself. In the data we do find some correlation with that one incident and some internal mongod counters. (I attached a graph that we are using for internal discussions about the issue showing that correlation.) However, as it is only one brief incident is too little for drawing conclusions. I hate to have to ask you for more information, but would you be able run the same data collection (serverStatus, iostat, and mongod log) for a longer period of time, maybe a few hours or a day, suffcient to cover a few incidents, so that we can see if they all follow the same pattern? Also let me ask a couple questions:
Thanks, | |||||||||||||||
| Comment by JinHui [ 15/Jul/15 ] | |||||||||||||||
|
The issue still exists. /* 6 */ { "op" : "insert", "millis" : 560, "ts" : ISODate("2015-07-15T07:53:44.188Z") }/* 7 */ { "op" : "insert", "millis" : 534, "ts" : ISODate("2015-07-15T07:53:44.188Z") }/* 8 */ { "op" : "insert", "millis" : 281, "ts" : ISODate("2015-07-15T07:53:44.188Z") }/* 9 */ { "op" : "insert", "millis" : 694, "ts" : ISODate("2015-07-15T07:53:44.188Z") }/* 10 */ { "op" : "insert", "millis" : 730, "ts" : ISODate("2015-07-15T07:53:44.188Z") }/* 11 */ { "op" : "insert", "millis" : 720, "ts" : ISODate("2015-07-15T07:53:44.187Z") }/* 12 */ { "op" : "insert", "millis" : 428, "ts" : ISODate("2015-07-15T07:53:44.187Z") }/* 13 */ { "op" : "insert", "millis" : 451, "ts" : ISODate("2015-07-15T07:53:44.187Z") }/* 14 */ { "op" : "insert", "millis" : 201, "ts" : ISODate("2015-07-15T07:53:44.187Z") }/* 15 */ { "op" : "insert", "millis" : 216, "ts" : ISODate("2015-07-15T07:53:44.187Z") }/* 16 */ { "op" : "insert", "millis" : 898, "ts" : ISODate("2015-07-15T07:53:44.187Z") }/* 17 */ { "op" : "insert", "millis" : 482, "ts" : ISODate("2015-07-15T07:53:44.186Z") }/* 18 */ { "op" : "insert", "millis" : 919, "ts" : ISODate("2015-07-15T07:53:44.186Z") }/* 19 */ { "op" : "insert", "millis" : 464, "ts" : ISODate("2015-07-15T07:53:44.186Z") }/* 20 */ { "op" : "insert", "millis" : 509, "ts" : ISODate("2015-07-15T07:53:44.186Z") }/* 21 */ { "op" : "insert", "millis" : 234, "ts" : ISODate("2015-07-15T07:53:44.186Z") }/* 22 */ { "op" : "insert", "millis" : 823, "ts" : ISODate("2015-07-15T07:53:44.186Z") }/* 23 */ { "op" : "insert", "millis" : 264, "ts" : ISODate("2015-07-15T07:53:44.186Z") }/* 24 */ { "op" : "insert", "millis" : 766, "ts" : ISODate("2015-07-15T07:53:44.186Z") }/* 25 */ { "op" : "insert", "millis" : 315, "ts" : ISODate("2015-07-15T07:53:44.186Z") }/* 26 */ { "op" : "insert", "millis" : 458, "ts" : ISODate("2015-07-15T07:53:44.186Z") }/* 27 */ { "op" : "insert", "millis" : 677, "ts" : ISODate("2015-07-15T07:53:44.185Z") }/* 28 */ { "op" : "insert", "millis" : 274, "ts" : ISODate("2015-07-15T07:53:44.185Z") }/* 29 */ { "op" : "insert", "millis" : 444, "ts" : ISODate("2015-07-15T07:53:44.185Z") }/* 30 */ { "op" : "insert", "millis" : 660, "ts" : ISODate("2015-07-15T07:53:44.185Z") }/* 31 */ { "op" : "insert", "millis" : 739, "ts" : ISODate("2015-07-15T07:53:44.185Z") } | |||||||||||||||
| Comment by JinHui [ 05/Jul/15 ] | |||||||||||||||
|
Still can't reproduce the issue? | |||||||||||||||
| Comment by JinHui [ 30/Jun/15 ] | |||||||||||||||
|
Thanks for your updates. Here are some other information may be helpful. We have two database in this server. One is for normal business request, which contains the table DimUser we mention in this thread. The other one is for log request, which records about 200 rows/second. And we find that the slow queries always occur in two databases at the same time. | |||||||||||||||
| Comment by Ramon Fernandez Marina [ 29/Jun/15 ] | |||||||||||||||
|
Hi cainiaoiswo, we're still investigating this issue. The last batch of information you sent does show slow queries in the range of hundreds of milliseconds, but haven't been able to find the cause; I've also tried reproducing this on my end, unsuccessfully. I'll let you know if I need any further information – thanks for your patience and understanding. Regards, | |||||||||||||||
| Comment by JinHui [ 29/Jun/15 ] | |||||||||||||||
|
Do you need any other Log or information from my side? | |||||||||||||||
| Comment by JinHui [ 24/Jun/15 ] | |||||||||||||||
|
Hi Ramon, here is the log we caught last night. | |||||||||||||||
| Comment by Ramon Fernandez Marina [ 22/Jun/15 ] | |||||||||||||||
|
cainiaoiswo, thanks for uploading the requested information. While I do see the long queries in the logs I haven't been able to find the "smoking gun", so we'll need more information to track down the root cause of the issue. I'd like to ask you to run the following commands on the node you're seeing the slow queries (I presume is your primary):
These will monitor your mongod instance and log information to the files ss.log and iostat.log for as long as they run. The hope is to record system information at the same time you're experiencing slow queries, so please continue to run these commands until you see some slow queries, after which you can interrupt the monitoring commands and upload the ss.log and iostat.log files so we can analyze them. Thanks, | |||||||||||||||
| Comment by JinHui [ 18/Jun/15 ] | |||||||||||||||
|
Is there any update for my issue? | |||||||||||||||
| Comment by JinHui [ 17/Jun/15 ] | |||||||||||||||
|
Hi Ramon, here is the Level 5 log. And the png file seems like a slow query plan. As I found the query plan was decide at 2015-06-18T00:34:01.000 and it keep running for 285ms. | |||||||||||||||
| Comment by JinHui [ 17/Jun/15 ] | |||||||||||||||
|
Here is the server log. As you can see in the file, since line 1000, lots of query cause more then thousand of MS. | |||||||||||||||
| Comment by JinHui [ 17/Jun/15 ] | |||||||||||||||
|
The problem is going worse. Lots of query take more than 1000 ms. Seems our production environment is going crush. | |||||||||||||||
| Comment by JinHui [ 17/Jun/15 ] | |||||||||||||||
|
Hi Ramon, here is some log info you need. | |||||||||||||||
| Comment by JinHui [ 17/Jun/15 ] | |||||||||||||||
|
Most of the time, the query time is OK, which only take about several ms. | |||||||||||||||
| Comment by Ramon Fernandez Marina [ 17/Jun/15 ] | |||||||||||||||
|
cainiaoiswo, in addition to Dan's question we'll need some more information to troubleshoot this issue.
Thanks, | |||||||||||||||
| Comment by Daniel Pasette (Inactive) [ 17/Jun/15 ] | |||||||||||||||
|
Even without knowing what else is happening on this server, that is an unusually long time for the query and document size. If you run the same query repeatedly, does the time to execute stay the same? | |||||||||||||||
| Comment by JinHui [ 17/Jun/15 ] | |||||||||||||||
|
avgObjSize is 256 Byte. | |||||||||||||||
| Comment by Daniel Pasette (Inactive) [ 17/Jun/15 ] | |||||||||||||||
|
Can you tell me how large the document you are querying for is? | |||||||||||||||
| Comment by JinHui [ 17/Jun/15 ] | |||||||||||||||
|
Our platform is CentOS6.5; MongoDB 3.0.3 WiredTiger Please let me know if you need any other information. |