[SERVER-17499] Using eval command to run getMore on aggregation cursor trips fatal assertion Created: 06/Mar/15 Updated: 18/Sep/15 Resolved: 09/Mar/15 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | 3.0.0 |
| Fix Version/s: | 3.0.1, 3.1.0 |
| Type: | Bug | Priority: | Critical - P2 |
| Reporter: | Patrig Droumaguet [X] | Assignee: | J Rassi |
| Resolution: | Done | Votes: | 0 |
| Labels: | 3.0, ET, upgrading | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
Ubuntu 14.04.2 |
||
| Issue Links: |
|
||||
| Backwards Compatibility: | Fully Compatible | ||||
| Operating System: | Linux | ||||
| Backport Completed: | |||||
| Participants: | |||||
| Description |
|
Hi, We had a replica set of three dedicated servers on Ubuntu 14.04. Everything was working fine on 2.6.8. We have stopped one of them to reinstall Ubuntu (clean slate) and to install MongoDB 3.0 with WiredTiger. The initialization has gone OK, the replication was working great (the primary was still on 2.6.8). Using rs.slaveOk(), we could see that the data was up-to-date. We then have tried to push him as primary by calling a rs.freeze(600) on the other secondary and a stepDown() on the primary. After less than 10 seconds, mongod crashes. Starting mongod again, the server works again as secondary. As it is written that each server should be on 2.6.8 or preferably 3.0, we've tried to update the second secondary server, but this time without reinstalling everything (so we've only updated the Ubuntu sources list and launch sudo apt-get update / sudo apt-get upgrade). The server updated, and restarted as secondary without any issue. It was possible to call queries from the shell without problem. Then we've called a rs.freeze(600) on the secondary on 3.0 WiredTiger and a rs.stepDown() on the primary still on 2.6.8. Very quickly the server has crashed and the previous primary turns back as primary. The log was identical as the crash log on the first server. After restarting the second 3.0 server as secondary, we've hoped that if all the servers on the replica set were on 3.0 it could work… So we've called a rs.freeze(600) on the first 3.0 secondary server (the one using WiredTiger) and we've shut down the primary by calling sudo service mongod stop on Ubuntu shell. The server on 3.0 using MMAPv1 turns primary, and crashes after few seconds. We then have started the server on 2.6.8 and after a very short rollback it has gone back primary. Here is an example of what we could see in the log on the 3.0 servers after the crash :
We can see in the log that some queries have managed to be called just before the crash, for example :
So now we have a primary on 2.6.8 and two secondaries on 3.0, which can't be primary. |
| Comments |
| Comment by Githook User [ 09/Mar/15 ] | |||||||||||||||||
|
Author: {u'username': u'jrassi', u'name': u'Jason Rassi', u'email': u'rassi@10gen.com'}Message: | |||||||||||||||||
| Comment by Githook User [ 09/Mar/15 ] | |||||||||||||||||
|
Author: {u'username': u'jrassi', u'name': u'Jason Rassi', u'email': u'rassi@10gen.com'}Message: | |||||||||||||||||
| Comment by Githook User [ 09/Mar/15 ] | |||||||||||||||||
|
Author: {u'username': u'jrassi', u'name': u'Jason Rassi', u'email': u'rassi@10gen.com'}Message: | |||||||||||||||||
| Comment by Githook User [ 09/Mar/15 ] | |||||||||||||||||
|
Author: {u'username': u'jrassi', u'name': u'Jason Rassi', u'email': u'rassi@10gen.com'}Message: | |||||||||||||||||
| Comment by J Rassi [ 06/Mar/15 ] | |||||||||||||||||
|
Thanks for reporting this issue. We can confirm that the root cause is that the server hits a fatal assertion when the eval command is used to run a getMore on an aggregation cursor. As a workaround, you should indeed change your application code to use the MongoCollection::aggregate() helper as follows:
Please continue to watch this ticket for updates on when a fix will be available. ~ Jason Rassi | |||||||||||||||||
| Comment by Patrig Droumaguet [X] [ 06/Mar/15 ] | |||||||||||||||||
|
Hi David, I have to figure out how to extract only the part of the log that is interesting, but apparently the crash is connected to some $eval commands. A bit earlier than the crash, I can read this in the log :
I have to say that I had used MongoDB::execute() to call aggregations because I was using a framework which previously couldn't support the PHP drivers after 1.3.5 (when Mongo has changed to MongoClient), and the aggregate() collection method didn't existed then (if I'm not wrong). But recently (and to prepare MongoDB 3) I had upgraded my code and the framework to be able to use the 1.6+ driver. And still on MongoDB 2.6.8 the execute() method for aggregations was working fine. I may need to change all my aggregations calls. Thanks for your help ! | |||||||||||||||||
| Comment by David Storch [ 06/Mar/15 ] | |||||||||||||||||
|
Thanks Sicabol. It looks like the crash is happening inside the eval command. Would you be able to try reproducing the problem while running with logLevel 2? You can increase the logLevel using the shell's logLevel parameter, e.g.:
The verbose logs prior to and including the time of the crash would be very useful. Best, | |||||||||||||||||
| Comment by Patrig Droumaguet [X] [ 06/Mar/15 ] | |||||||||||||||||
|
OK, now that's very interesting ! I've tried in PHP to use an MongoDB::execute() command on my database (a find().explain()) but by forcing the read preference on SECONDARY, and I've tried to do the same by detailing each step. The first try shows that execute() doesn't use a secondary server but the primary, the second try was using a secondary server. Maybe it's related, as I call several MongoDB::execute()…
| |||||||||||||||||
| Comment by Patrig Droumaguet [X] [ 06/Mar/15 ] | |||||||||||||||||
|
Hi, The application using the database is developed with PHP. We use the 1.6.4 driver. There are some agregations called with MongoDB::execute() method, which (AFAIK) is equivalent to MongoDB::command() with an eval. We use neither MapReduce nor $where. Here is an example of a command. $profiles_ids_str is an array of ObjectIds (in string, as in "ObjectId('xxxx')").
| |||||||||||||||||
| Comment by David Storch [ 06/Mar/15 ] | |||||||||||||||||
|
Hi Sicabol, Thanks for reporting this issue. We are trying to reproduce on our end, but in the meantime I would like to gather some additional information. Are you running anything that would create a JavaScript context on the server? This includes:
Thanks, |