[SERVER-21105] Active find/getmore commands segfault when repl PV changes from 1->0 Created: 23/Oct/15 Updated: 27/Oct/15 Resolved: 26/Oct/15 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Querying, Replication |
| Affects Version/s: | 3.2.0-rc0 |
| Fix Version/s: | 3.2.0-rc1 |
| Type: | Bug | Priority: | Critical - P2 |
| Reporter: | Timothy Olsen (Inactive) | Assignee: | Scott Hernandez (Inactive) |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||
| Issue Links: |
|
||||
| Backwards Compatibility: | Fully Compatible | ||||
| Operating System: | ALL | ||||
| Steps To Reproduce: |
|
||||
| Participants: | |||||
| Description |
|
Reconfiguring a mongodb 3.2 replica set with protocolVersion = 0 crashes the primary on Mac OS X. This does not happen on linux. This does not happen with a 1-member replica set. It does happen with a 3-member replica set. Logs for all 3 members attached. This was with 3.2.0-rc1-pre- commit dbbc9a2e3d8c4d7fe1748fa980ba7d01b9489dbe |
| Comments |
| Comment by Githook User [ 26/Oct/15 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Author: {u'username': u'visualzhou', u'name': u'Siyuan Zhou', u'email': u'siyuan.zhou@mongodb.com'}Message: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Scott Hernandez (Inactive) [ 24/Oct/15 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
This is triggered when a getmore (from the secondary doing replication) is active and the protocol version changes out from under it. The replication source (primary in this case) tries to update the term even though the protocol version is now 0, even though the client is sending a term (which was valid at the start of the operation, but not at the end). | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Scott Hernandez (Inactive) [ 23/Oct/15 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
From the logs this looks like a problem for all platforms so I'll take a look at it this weekend to get a general repro jstest, and see if it triggers on linux/win64. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Timothy Olsen (Inactive) [ 23/Oct/15 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
It's reproduced every time I've tried it | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Timothy Olsen (Inactive) [ 23/Oct/15 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Here is my shell session:
I had no other connections open to the replica set or any of its members at the time. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Scott Hernandez (Inactive) [ 23/Oct/15 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Can you include your (shell) script which repro'd this? It looks like a getmore caused the crash, maybe while the reconfig was executing. Could there have been any other client traffic/ops going on at the same time? |