[SERVER-14186] rs.stepDown during mapReduce causes fassert in logOp Created: 06/Jun/14 Updated: 11/Jul/16 Resolved: 27/Aug/14 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | MapReduce, Replication |
| Affects Version/s: | 2.6.1 |
| Fix Version/s: | 2.6.2, 2.7.4 |
| Type: | Bug | Priority: | Critical - P2 |
| Reporter: | Bruce Lucas (Inactive) | Assignee: | Kaloian Manassiev |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||||||
| Backport Completed: | |||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||
| Description |
| Comments |
| Comment by Githook User [ 16/Jul/14 ] |
|
Author: {u'username': u'milkie', u'name': u'Eric Milkie', u'email': u'milkie@10gen.com'}Message: We need to check master-ness under a write lock before doing a logOp() or else (PARTIAL cherry pick from commit f65ef0b5c272d94f500ea615e36023b45cdf088e) Conflicts: |
| Comment by Eric Milkie [ 15/Jul/14 ] |
|
Forward port from 2.6 is non-trivial, as the common place where commands take locks has been pushed down into the individual commands. I suspect the right solution here might be to do individual checking in each command's body after they grab their read or write lock? |
| Comment by Githook User [ 06/Jun/14 ] |
|
Author: {u'username': u'milkie', u'name': u'Eric Milkie', u'email': u'milkie@10gen.com'}Message: We need to check master-ness under a write lock before doing a logOp() or else |
| Comment by Eric Milkie [ 06/Jun/14 ] |
|
For 2.6, we should amend execCommand() to call isMaster() only after obtaining a read or write lock. This will prevent the 6 or so commands that call logOp from running after primary demotion. For master, we need to examine each of the logOpping commands and do an isMaster check after each gains a write lock. This is temporary until we have a common Recovery object to do this sort of checking. |
| Comment by Eric Milkie [ 06/Jun/14 ] |
|
Note that prior to The error itself occurs because for commands, we check primaryness outside a dblock, so there is no synchronization between itself and another connection doing a stepDown. |
| Comment by J Rassi [ 06/Jun/14 ] |
|
The server must have yielded the database write lock between the drop of the temp collection and the logOp for the drop. Bruce, can you re-test with 2.6.2-rc0? That has a fix for |