[SERVER-20973] Initial sync during index drop can cause loss on new member Created: 16/Oct/15 Updated: 29/Oct/15 Resolved: 16/Oct/15 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Querying |
| Affects Version/s: | 2.6.11, 3.0.7 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Critical - P2 |
| Reporter: | James Wahlin | Assignee: | Eric Milkie |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||||||
| Issue Links: |
|
||||||||||||
| Operating System: | ALL | ||||||||||||
| Steps To Reproduce: | 1) Build MongoDB 3.0.7 with attached "mongod3-0-7.patch" applied. This patch forces mongoD queries to yield frequently and adds a 100ms sleep during the yield, making this issue easier to reproduce.
|
||||||||||||
| Participants: | |||||||||||||
| Description |
|
During the initial sync process mongod will clone data from it's sync source via a getMore() operation. It is possible for this getMore to end early for a given collection on cursor invalidation, having returned only a partial data set. This results in a new replica member with an incomplete data set. To trigger this issue the following must occur: This does not appear to be an issue under 3.2.0-rc0. |
| Comments |
| Comment by David Storch [ 29/Oct/15 ] |
|
james.wahlin, on the 2.6 branch, DEAD Runners will not return an error to the client: It should be the same as 3.0 in this respect. |
| Comment by James Wahlin [ 29/Oct/15 ] |
|
schwerin - sorry, missed your comment earlier. This does impact 2.6 but as Eric mentioned may have a different root clause. From a first pass it looks like 2.6 will return an error on a dead PlanExecutor so it may indeed be that the error is reported but ignored by the cloner. I need to do some work on the repro script for 2.6 but will confirm once I have validated this theory. (CC: milkie) |
| Comment by Eric Milkie [ 16/Oct/15 ] |
|
I was incorrect about 2.6; the issue is a bit different there though, as I believe the cloner doesn't detect query errors even if we started returning them if the cursor is closed prematurely. |
| Comment by Andy Schwerin [ 16/Oct/15 ] |
|
The report claims that the 2.6 series is affected. james.wahlin, can you confirm? |
| Comment by Eric Milkie [ 16/Oct/15 ] |
|
This is confirmed to be a duplicate of |