[CDRIVER-695] _mongoc_cluster_node_destroy segfaults in certain scenarios Created: 02/Jun/15  Updated: 05/Aug/15  Resolved: 21/Jun/15

Status: Closed
Project: C Driver
Component/s: libmongoc
Affects Version/s: 1.1.6
Fix Version/s: 1.1.8

Type: Bug Priority: Critical - P2
Reporter: Anil Kumar Assignee: A. Jesse Jiryu Davis
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Related
related to CDRIVER-721 Crash destroying replset client after... Closed

 Description   

Program terminated with signal 11, Segmentation fault.
#0 0x00000000ffffffff in ?? ()
(gdb) bt
#0 0x00000000ffffffff in ?? ()
#1 0x00007fcc757c43e2 in _mongoc_cluster_node_destroy ()
#2 0x00007fcc757c6f19 in _mongoc_cluster_destroy ()
#3 0x00007fcc757c31a6 in mongoc_client_destroy ()
#4 0x00007fcc757c396e in mongoc_client_pool_push ()



 Comments   
Comment by Githook User [ 05/Aug/15 ]

Author:

{u'username': u'ajdavis', u'name': u'A. Jesse Jiryu Davis', u'email': u'jesse@mongodb.com'}

Message: CDRIVER-721 mongoc_client_destroy crash after connection fails

Undo two bad changes introduced while fixing CDRIVER-695, and add
another safety check in _mongoc_cluster_node_destroy.
Branch: 1.2.0-dev
https://github.com/mongodb/mongo-c-driver/commit/bea221041eb8886f8d851a76b3d80ac9a6443eee

Comment by Githook User [ 05/Aug/15 ]

Author:

{u'username': u'ajdavis', u'name': u'A. Jesse Jiryu Davis', u'email': u'jesse@mongodb.com'}

Message: CDRIVER-695 crash destroying node after auth err

Avoid scenarios like:

1. Connect to 2-node replica set.

2. _cluster_reconnect_replica_set enters first loop, calls ismaster on primary
and finds two peers.

3. nodes_len is set to 2 and the nodes list is realloc'ed, but the second node
is uninitialized.

4. _mongoc_cluster_reconnect_replica_set enters second loop.

5. Auth fails, "goto CLEANUP".

6. Now nodes_len is 2 but the second node is still uninitialized.

7. Later, _mongoc_cluster_node_destroy iterates over both nodes.

8. Destroying second, uninitialized node calls stream->close, which is a random
location, segfaults.
Branch: 1.2.0-dev
https://github.com/mongodb/mongo-c-driver/commit/19d2da28257ea3ae24cf3f832d16487b5628314c

Comment by Githook User [ 05/Aug/15 ]

Author:

{u'username': u'ajdavis', u'name': u'A. Jesse Jiryu Davis', u'email': u'jesse@mongodb.com'}

Message: CDRIVER-695 checked errors in cluster logic

Hope to make a crash in _mongoc_cluster_node_destroy easier to diagnose.
Branch: 1.2.0-dev
https://github.com/mongodb/mongo-c-driver/commit/c35aea088cfd43b5b62b11dddd8bc050c0ea47d2

Comment by Githook User [ 26/Jun/15 ]

Author:

{u'username': u'ajdavis', u'name': u'A. Jesse Jiryu Davis', u'email': u'jesse@mongodb.com'}

Message: CDRIVER-721 mongoc_client_destroy crash after connection fails

Undo two bad changes introduced while fixing CDRIVER-695, and add
another safety check in _mongoc_cluster_node_destroy.
Branch: master
https://github.com/mongodb/mongo-c-driver/commit/bea221041eb8886f8d851a76b3d80ac9a6443eee

Comment by Githook User [ 26/Jun/15 ]

Author:

{u'username': u'ajdavis', u'name': u'A. Jesse Jiryu Davis', u'email': u'jesse@mongodb.com'}

Message: CDRIVER-721 mongoc_client_destroy crash after connection fails

Undo two bad changes introduced while fixing CDRIVER-695, and add
another safety check in _mongoc_cluster_node_destroy.
Branch: CDRIVER-721-crash-unavail-rs
https://github.com/mongodb/mongo-c-driver/commit/32cd79d9278dc365fa1cc8746294cd305cb78b29

Comment by Githook User [ 21/Jun/15 ]

Author:

{u'username': u'ajdavis', u'name': u'A. Jesse Jiryu Davis', u'email': u'jesse@mongodb.com'}

Message: CDRIVER-695 crash destroying node after auth err

Avoid scenarios like:

1. Connect to 2-node replica set.

2. _cluster_reconnect_replica_set enters first loop, calls ismaster on primary
and finds two peers.

3. nodes_len is set to 2 and the nodes list is realloc'ed, but the second node
is uninitialized.

4. _mongoc_cluster_reconnect_replica_set enters second loop.

5. Auth fails, "goto CLEANUP".

6. Now nodes_len is 2 but the second node is still uninitialized.

7. Later, _mongoc_cluster_node_destroy iterates over both nodes.

8. Destroying second, uninitialized node calls stream->close, which is a random
location, segfaults.
Branch: master
https://github.com/mongodb/mongo-c-driver/commit/19d2da28257ea3ae24cf3f832d16487b5628314c

Comment by A. Jesse Jiryu Davis [ 17/Jun/15 ]

The bug is in _cluster_reconnect_replica_set, which has two loops. The first loop tries nodes until it finds a primary. In the second loop, it iterates over the primary's peer list connecting and authenticating with each peer, including the primary itself.

The crash comes when we:

1. Connect to a 2-node replica set.
2. _cluster_reconnect_replica_set enters its first loop, calls ismaster on primary and finds two peers.
3. nodes_len is set to 2 and the nodes list is realloc'ed, but the second node struct is uninitialized.
4. _mongoc_cluster_reconnect_replica_set enters its second loop.
5. Auth fails on the first node (the primary) so the driver breaks from the loop with "goto CLEANUP".
6. Now nodes_len is 2 but the second node is still uninitialized!
7. Later, _mongoc_cluster_node_destroy iterates the nodes list, destroying them.
8. Since nodes_len is 2, _mongoc_cluster_node_destroy tries to destroy the second, uninitialized node.
9. If the second node's stream happens to be non-NULL, it calls stream->close on the second node's stream, and segfaults.

The fix is to properly manage nodes_len: don't increment it to N unless N nodes have actually been initialized.

Additionally, zero-out all nodes right after reallocing the nodes list to ensure all data structures are NULL.

Comment by A. Jesse Jiryu Davis [ 15/Jun/15 ]

Repro script using MockupDB:

https://gist.github.com/ajdavis/745af939e0eb3e2c8cac

MockupDB is here:

http://mockupdb.readthedocs.org/

Comment by Githook User [ 09/Jun/15 ]

Author:

{u'username': u'ajdavis', u'name': u'A. Jesse Jiryu Davis', u'email': u'jesse@mongodb.com'}

Message: CDRIVER-695 checked errors in cluster logic

Hope to make a crash in _mongoc_cluster_node_destroy easier to diagnose.
Branch: master
https://github.com/mongodb/mongo-c-driver/commit/c35aea088cfd43b5b62b11dddd8bc050c0ea47d2

Comment by A. Jesse Jiryu Davis [ 03/Jun/15 ]

Reporter's URI something like the form:

mongodb://user:pass@host1,host2,host3/admin?replicaSet=rs&maxpoolsize=100&minpoolsize=50&ssl=true&connecttimeoutms=5000&socketTimeoutMS=5000

Generated at Wed Feb 07 21:10:20 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.