[SERVER-6965] mongo process going unresponsive Created: 07/Sep/12 Updated: 08/Mar/13 Resolved: 19/Dec/12 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | 2.0.2 |
| Fix Version/s: | None |
| Type: | Question | Priority: | Blocker - P1 |
| Reporter: | Abhishek Kumar | Assignee: | Randolph Tan |
| Resolution: | Incomplete | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
AWS xLarge instance backed by RAID 10 EBS volumes |
||
| Participants: |
| Description |
|
Replica set with 5 mongo boxes (slaveOk = disabled) Suddenly the mongod process go unresponsive. The box just keeps on getting connections from other mongod servers or applications(causing the num-sockets to keep on increasing). Also at that time, I am not able to login to mongo console. Neither it gets stopped from our regular stop script. The mongod process simply go unresponsive. This had happened some 5-6 times with different servers(sometimes on primaries and secondary) last 1 week. And at that time I just have to force kill that process and then restart. This is the primary log, where we can clearly see, that suddenly the read queries stopped coming(logging). At the same time, mongo-java-client reporting timeOut exception : http://pastebin.com/CB7hEced Also this is not the fact that queries took more time. The queries hardly take 100 ms. Neither there is any spikes in CPU, load (except for the total-open-sockets) Need some answer for this behavior? |
| Comments |
| Comment by Randolph Tan [ 15/Oct/12 ] |
|
Sorry for the delay, it's really hard to tell what is going on without the other data either from MMS or mongostats during the hanged state (and a couple of moments before it). It's also hard to tell without currentOp which operation is the culprit for hanging the server. I would caution against using eval functionalities (and other features that rely on the javascript engine) because it can easily create bottlenecks. This is because it is possible for multiple readers in different threads to hold the read lock while it is not the case for the global Javascript interpreter lock in the server. |
| Comment by Abhishek Kumar [ 12/Oct/12 ] |
|
Hey any updates on this issue. |
| Comment by Abhishek Kumar [ 12/Sep/12 ] |
|
Its a stored function in system.js collection, which is called from the application, with nolock = true. Also, can you update the doc on the page http://www.mongodb.org/display/DOCS/Server-side+Code+Execution regarding nolock option. Its not clear what does this option mean. Will this option ensures that no write lock is taken? |
| Comment by Randolph Tan [ 12/Sep/12 ] |
|
Sorry, I forgot to ask for some clarifications regarding on how you use the stored function. I'm also assuming that "// some if-else operation" section does not access the database, correct? |
| Comment by Abhishek Kumar [ 12/Sep/12 ] |
|
Then in the $eval I am just using the read only which dont involve, $where, $eval, map reduce or writes. I still didn't get the reason for the deadlock. |
| Comment by Randolph Tan [ 12/Sep/12 ] |
|
You can do reads as long as they don't use the server side js features like $where, $eval, map reduce, but not writes. Thanks for bringing this to our attention, we'll update our docs to include some warning against using $eval. |
| Comment by Abhishek Kumar [ 11/Sep/12 ] |
|
So, inside the eval we should never do any read or write operation? If this is the case, the docs at http://www.mongodb.org/display/DOCS/Server-side+Code+Execution is not specifying this thing clearly. |
| Comment by Randolph Tan [ 10/Sep/12 ] |
|
The sample code you posted looks fine since the function is inactive and will not be executed in the server implicitly. The problem with the js features like $eval, $where or mapreduce is you shouldn't have a function body that either needs a write lock or calls one of the js features. This is because these js features requires the global javascript lock and you can cause a deadlock by doing so. |
| Comment by Abhishek Kumar [ 08/Sep/12 ] |
|
Its a system.js function which is internally doing two find operations on two different collection. Each find operations on some selected set of documents. $eval function : http://pastebin.com/9wHFMQZX Also this sometimes happened in secondary which didn't receive any read operation. |
| Comment by Randolph Tan [ 08/Sep/12 ] |
|
Depending on how you wrote $eval, you can deadlock the server. |
| Comment by Abhishek Kumar [ 08/Sep/12 ] |
|
As I told I was not able to connect to mongod via mongo console, so I can't give you the db.currentOp(). The mongod process simply goes unresponsive. The servers are not at MMS. I see the following warning all over my mongod logs. The operation after which I see this warning is mostly: }, { att.r: { $type: 2 }} ] }, update: { $set: { att.r: new Date(1346828661274) } } } ntoreturn:1 reslen:737 21ms ", args: [ [ "k@domain.com", "-6@chat.facebook.com~fb", "x@gmail.com~gt" ], new Date(1341644733695) ], nolock: true } ntoreturn:1 reslen:4703 2353ms Last 1.5 days, I got some 225 such warning messages. But as I read over the forum, its not much to worry about. There is a $eval process, as mentioned above. But again, this can't be the reason, because sometimes I faced the same issue with the secondary, which don't receive any reads(slaveOk = disabled). |
| Comment by Randolph Tan [ 07/Sep/12 ] |
|
It would be helpful if you can provide the output of db.currentOp()? This operation does not require a lock, so you can possibly get some response from it, unless the server is overloaded. Do you also have these servers in MMS? Do you use server side javascript features like $where, $eval or mapreduce? |