[SERVER-15798] Helpers::removeRange does not check if node is primary Created: 24/Oct/14 Updated: 24/Dec/14 Resolved: 22/Dec/14 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication, Sharding |
| Affects Version/s: | 2.6.4, 2.8.0-rc0 |
| Fix Version/s: | 2.8.0-rc4 |
| Type: | Bug | Priority: | Critical - P2 |
| Reporter: | Alex Piggott | Assignee: | Randolph Tan |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||
| Steps To Reproduce: | 4x server containing 2 replica sets, running Amazon Linux (~Centos6) with all running 2.6.4: [root@ip-10-60-18-179 mongo]# rpm -qa | grep mongo |
||||||||||||||||||||
| Participants: | |||||||||||||||||||||
| Description |
|
Original title: rs.stepDown during migration causes fassert in logOp Original description:
|
| Comments |
| Comment by Githook User [ 22/Dec/14 ] | ||||
|
Author: {u'username': u'renctan', u'name': u'Randolph Tan', u'email': u'randolph@10gen.com'}Message: | ||||
| Comment by Randolph Tan [ 12/Nov/14 ] | ||||
|
Note: Other write ops like insert, delete and update performs this check after grabbing the exclusive lock. | ||||
| Comment by Alex Piggott [ 12/Nov/14 ] | ||||
|
Thanks! I might leave that one running overnight.... | ||||
| Comment by Ramon Fernandez Marina [ 12/Nov/14 ] | ||||
|
Hi apiggott@ikanow.com, sorry to hear you experienced another crash. You can find the documents with large values using a query like the following:
Replace the value of url above with the appropriate key path in your documents. I'm trying to reproduce the issue on this end, and a developer is looking at the code as well. | ||||
| Comment by Alex Piggott [ 12/Nov/14 ] | ||||
|
Incidentally is there a slick way I can identify which records in my DB have this too long problem (apart from by looking in the logs each time it crashed - it crashed again today!) and deleting the specific one it cared about (forcing it to do a table scan lookup against the URL somehow)? | ||||
| Comment by Alex Piggott [ 28/Oct/14 ] | ||||
|
(Incidentally isn't failIndexKeyTooLong the opposite of what I want ... since it will allow documents with long index keys to be inserted moving forward. My problem is that I have existing docs with long index keys because 2.4- allowed the insertion of such fields. At least now no new problematic documents are getting inserted, I just need to fix the ones that are already in the DB) [EDIT: oh or do you mean that the chunk migration won't fail any more if failIndexKeyTooLong is false? That would make sense! Though I think discarding any future "corrupt" docs is the better default] | ||||
| Comment by Alex Piggott [ 28/Oct/14 ] | ||||
|
Thanks for looking into it - to clarify the first error I don't care about at all, I just included it in case it was the root cause of the fassert. Preventing the insertion of long fields is certainly my problem to fix! However, obviously even if I am trying to insert keys that are too long then that shouldn't crash the DB, so the purpose of this issue was to let you know that it was happening (or something along those lines - my guess would be any migration can cause the problem, it was just more likely to occur given the repeated attempts due to the long index failure). Unfortunately the database on which this occurred is operational so I can't mess about trying to make it fail. Presumably the steps would be: Though presumably that's what you've been trying without luck? Since this is the first time the DB has crashed in about 2 years, and each of 2 shards has been stepping down every week (and I think I've seen "index too long" errors since forever), I'm guessing it might be quite a low probability event... Did you ping the dev who fixed | ||||
| Comment by Ramon Fernandez Marina [ 28/Oct/14 ] | ||||
|
the first error I see in the log snippet you sent is related to a migration failure:
Please see the list of compatibility changes in MongoDB 2.6 for more details about this issue. In your particular case:
You may need to set failIndexKeyTooLong to false and/or fix the offending documents first:
Then there's the fatal assertion message:
I'm trying to reproduce this error, unsuccessfully so far, so if you have a reproducing script it would be of great help. In the meantime, can you please review the issue described above with long keys and let us know if rs.stepDown() still triggers the assertion? | ||||
| Comment by Alex Piggott [ 24/Oct/14 ] | ||||
|
(When I say "exactly the same issue", I mean "the same problem except during migration not map/reduce"! Migration wasn't mentioned in |