[SERVER-32677] Segmentation fault converting ReplicaSet to Replicated Shard Cluster Created: 12/Jan/18 Updated: 30/Oct/23 Resolved: 20/Apr/18 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | 3.6.1 |
| Fix Version/s: | 3.6.4, 3.7.6 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Gianluca De Cicco | Assignee: | Blake Oler |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | neweng | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
docker container mongo:3.6.1 (debian jessy) https://github.com/docker-library/mongo/blob/657b1a53a9680b972a6344f3d958a17775dd8719/3.6/Dockerfile |
||
| Attachments: |
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Issue Links: |
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Steps To Reproduce: | start 2 data node with: (config attached)
connect to one data node and run the replicaset init
connecting to replicaset "rs/mongo_replica1:27017,mongo_replica2:27017" to add arbitrer
now following https://docs.mongodb.com/manual/tutorial/convert-replica-set-to-replicated-shard-cluster/#restart-the-replica-set-as-a-shard stop secondary and run
connect to primary and stepDown
restart old Primary with
everything reconnect. After some minutes (around 5. it's cyclic) in idle one data node receive SIGSEGV and on cascade also the other data node (but not the arbitrer) receive the same SIGSEGV. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Sprint: | Sharding 2018-02-12, Sharding 2018-02-26, Sharding 2018-03-12, Sharding 2018-03-26, Sharding 2018-04-23 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Case: | (copied to CRM) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Description |
|
The starting point are 3 node: 2 data barer and 1 arbitrer. All nodes started without the flag --shardsvr. Once the replicaset is initialized (initialization + addArb) it cannot be converted to a replicated shard. Once you restart the node with the flag --shardsvr they, after a while, access a bad memory segment (Invalid access at address: 0x18) and receive signal SIGSEGV. If restarted again they continue to receive the same signal after some time in idle (more or less 5 minute) |
| Comments |
| Comment by Githook User [ 01/May/18 ] |
|
Author: {'email': 'blake.oler@mongodb.com', 'name': 'Blake Oler', 'username': 'BlakeIsBlake'}Message: (cherry picked from commit 60cb34ea7351d25b0eb6bee947d21ada09cf438b) |
| Comment by Githook User [ 01/May/18 ] |
|
Author: {'email': 'blake.oler@mongodb.com', 'name': 'Blake Oler', 'username': 'BlakeIsBlake'}Message: Revert " This reverts commit 424111b2a3f4c30b7e637f4eadda6a18df9bf065. |
| Comment by Githook User [ 01/May/18 ] |
|
Author: {'email': 'blake.oler@mongodb.com', 'name': 'Blake Oler', 'username': 'BlakeIsBlake'}Message: (cherry picked from commit 60cb34ea7351d25b0eb6bee947d21ada09cf438b) |
| Comment by Githook User [ 01/May/18 ] |
|
Author: {'email': 'blake.oler@mongodb.com', 'name': 'Blake Oler', 'username': 'BlakeIsBlake'}Message: Revert " This reverts commit 424111b2a3f4c30b7e637f4eadda6a18df9bf065. |
| Comment by Githook User [ 01/May/18 ] |
|
Author: {'email': 'blake.oler@mongodb.com', 'name': 'Blake Oler', 'username': 'BlakeIsBlake'}Message: (cherry picked from commit 60cb34ea7351d25b0eb6bee947d21ada09cf438b) |
| Comment by Githook User [ 01/May/18 ] |
|
Author: {'email': 'blake.oler@mongodb.com', 'name': 'Blake Oler', 'username': 'BlakeIsBlake'}Message: Revert " This reverts commit 424111b2a3f4c30b7e637f4eadda6a18df9bf065. |
| Comment by Kaloian Manassiev [ 01/May/18 ] |
|
On the 3.6 branch: https://github.com/mongodb/mongo/commit/9e4b78f198fa6a0bca75fd1012d8437d98a3c825 |
| Comment by Githook User [ 20/Apr/18 ] |
|
Author: {'email': 'blake.oler@mongodb.com', 'username': 'BlakeIsBlake', 'name': 'Blake Oler'}Message: |
| Comment by Gregory McKeon (Inactive) [ 22/Mar/18 ] |
|
Clarifying the confusing state of this ticket: The fix for the 3.6 branch has been completed, and is in the 3.6.4 release. The fix in master is currently blocked, and this ticket will remain unresolved until the fix in master is pushed. |
| Comment by Githook User [ 28/Feb/18 ] |
|
Author: {'email': 'blake.oler@mongodb.com', 'name': 'Blake Oler', 'username': 'BlakeIsBlake'}Message: |
| Comment by Githook User [ 16/Feb/18 ] |
|
Author: {'email': 'blake.oler@mongodb.com', 'name': 'Blake Oler', 'username': 'BlakeIsBlake'}Message: Revert " This reverts commit cad0d35091f98b5c2bb37765861841844bd9e16d. |
| Comment by Githook User [ 06/Feb/18 ] |
|
Author: {'email': 'blake.oler@mongodb.com', 'name': 'Blake Oler', 'username': 'BlakeIsBlake'}Message: |
| Comment by Kaloian Manassiev [ 24/Jan/18 ] |
|
The problem happens when the logical session cache happens to run before the sharding initialization has completed. I think the fix should be to add a check that ShardingState::get(opCtx)->enabled() before attempting to reference the catalog cache here. As part of this fix we should see if there are other places which may be accessing sharding infrastructure, which is initialized late. |
| Comment by Mark Agarunov [ 12/Jan/18 ] |
|
Hello gdecicco, Thank you for the report. I've set the fixVersion to "Needs Triage" for this new feature to be scheduled against our currently planned work. Updates will be posted on this ticket as they happen. Thanks, |
| Comment by Gianluca De Cicco [ 12/Jan/18 ] |
|
I cannot edit the description, there is a typo: |
| Comment by Gianluca De Cicco [ 12/Jan/18 ] |
|
Addendum: if the flag --shardsvr is removed the replicaset stop receiving SIGSEGV |