[CDRIVER-695] _mongoc_cluster_node_destroy segfaults in certain scenarios Created: 02/Jun/15 Updated: 05/Aug/15 Resolved: 21/Jun/15 |
|
| Status: | Closed |
| Project: | C Driver |
| Component/s: | libmongoc |
| Affects Version/s: | 1.1.6 |
| Fix Version/s: | 1.1.8 |
| Type: | Bug | Priority: | Critical - P2 |
| Reporter: | Anil Kumar | Assignee: | A. Jesse Jiryu Davis |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||
| Description |
|
Program terminated with signal 11, Segmentation fault. |
| Comments |
| Comment by Githook User [ 05/Aug/15 ] | |
|
Author: {u'username': u'ajdavis', u'name': u'A. Jesse Jiryu Davis', u'email': u'jesse@mongodb.com'}Message: Undo two bad changes introduced while fixing | |
| Comment by Githook User [ 05/Aug/15 ] | |
|
Author: {u'username': u'ajdavis', u'name': u'A. Jesse Jiryu Davis', u'email': u'jesse@mongodb.com'}Message: Avoid scenarios like: 1. Connect to 2-node replica set. 2. _cluster_reconnect_replica_set enters first loop, calls ismaster on primary 3. nodes_len is set to 2 and the nodes list is realloc'ed, but the second node 4. _mongoc_cluster_reconnect_replica_set enters second loop. 5. Auth fails, "goto CLEANUP". 6. Now nodes_len is 2 but the second node is still uninitialized. 7. Later, _mongoc_cluster_node_destroy iterates over both nodes. 8. Destroying second, uninitialized node calls stream->close, which is a random | |
| Comment by Githook User [ 05/Aug/15 ] | |
|
Author: {u'username': u'ajdavis', u'name': u'A. Jesse Jiryu Davis', u'email': u'jesse@mongodb.com'}Message: Hope to make a crash in _mongoc_cluster_node_destroy easier to diagnose. | |
| Comment by Githook User [ 26/Jun/15 ] | |
|
Author: {u'username': u'ajdavis', u'name': u'A. Jesse Jiryu Davis', u'email': u'jesse@mongodb.com'}Message: Undo two bad changes introduced while fixing | |
| Comment by Githook User [ 26/Jun/15 ] | |
|
Author: {u'username': u'ajdavis', u'name': u'A. Jesse Jiryu Davis', u'email': u'jesse@mongodb.com'}Message: Undo two bad changes introduced while fixing | |
| Comment by Githook User [ 21/Jun/15 ] | |
|
Author: {u'username': u'ajdavis', u'name': u'A. Jesse Jiryu Davis', u'email': u'jesse@mongodb.com'}Message: Avoid scenarios like: 1. Connect to 2-node replica set. 2. _cluster_reconnect_replica_set enters first loop, calls ismaster on primary 3. nodes_len is set to 2 and the nodes list is realloc'ed, but the second node 4. _mongoc_cluster_reconnect_replica_set enters second loop. 5. Auth fails, "goto CLEANUP". 6. Now nodes_len is 2 but the second node is still uninitialized. 7. Later, _mongoc_cluster_node_destroy iterates over both nodes. 8. Destroying second, uninitialized node calls stream->close, which is a random | |
| Comment by A. Jesse Jiryu Davis [ 17/Jun/15 ] | |
|
The bug is in _cluster_reconnect_replica_set, which has two loops. The first loop tries nodes until it finds a primary. In the second loop, it iterates over the primary's peer list connecting and authenticating with each peer, including the primary itself. The crash comes when we: 1. Connect to a 2-node replica set. The fix is to properly manage nodes_len: don't increment it to N unless N nodes have actually been initialized. Additionally, zero-out all nodes right after reallocing the nodes list to ensure all data structures are NULL. | |
| Comment by A. Jesse Jiryu Davis [ 15/Jun/15 ] | |
|
Repro script using MockupDB: https://gist.github.com/ajdavis/745af939e0eb3e2c8cac MockupDB is here: | |
| Comment by Githook User [ 09/Jun/15 ] | |
|
Author: {u'username': u'ajdavis', u'name': u'A. Jesse Jiryu Davis', u'email': u'jesse@mongodb.com'}Message: Hope to make a crash in _mongoc_cluster_node_destroy easier to diagnose. | |
| Comment by A. Jesse Jiryu Davis [ 03/Jun/15 ] | |
|
Reporter's URI something like the form:
|