[CDRIVER-789] segfault in test_rs_seeds_connect Created: 10/Aug/15  Updated: 19/Oct/16  Resolved: 01/Sep/15

Status: Closed
Project: C Driver
Component/s: libmongoc
Affects Version/s: 1.2-beta0
Fix Version/s: 1.2-beta1

Type: Bug Priority: Critical - P2
Reporter: A. Jesse Jiryu Davis Assignee: A. Jesse Jiryu Davis
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   

test_rs_seeds_connect/reconnect and test_mongos_seeds_connect/reconnect, single and pooled versions, all crash with a backtrace like:

#1  mongoc_stream_failed (stream=0x100210082) at mongo-c-driver/src/mongoc/mongoc-stream.c:83
#2  mongoc_topology_scanner_ismaster_handler (async_status=MONGOC_ASYNC_CMD_ERROR, ismaster_response=0x0, rtt_msec=1137, data=0x100703fb0, error=0x100800c38) at mongo-c-driver/src/mongoc/mongoc-topology-scanner.c:193
#3  mongoc_async_cmd_run (acmd=0x100800c00) at mongo-c-driver/src/mongoc/mongoc-async-cmd.c:113
#4  mongoc_async_run (async=0x100702740, timeout_msec=29999) at mongo-c-driver/src/mongoc/mongoc-async.c:142
#5  mongoc_topology_scanner_work (ts=0x100703b90, timeout_msec=30000) at mongo-c-driver/src/mongoc/mongoc-topology-scanner.c:452
#6  _mongoc_topology_run_scanner (topology=0x100702c20, work_msec=30000) at mongo-c-driver/src/mongoc/mongoc-topology.c:323
#7  _mongoc_topology_do_blocking_scan (topology=0x100702c20) at mongo-c-driver/src/mongoc/mongoc-topology.c:349
#8  mongoc_topology_select (topology=0x100702c20, optype=MONGOC_SS_READ, read_prefs=0x1007069d0, local_threshold_ms=15, error=0x102808230) at mongo-c-driver/src/mongoc/mongoc-topology.c:397
#9  _mongoc_cluster_select_by_optype (cluster=0x1007047e8, optype=MONGOC_SS_READ, read_prefs=0x1007069d0, error=0x102808230) at mongo-c-driver/src/mongoc/mongoc-cluster.c:1454
#10 mongoc_cluster_select (cluster=0x1007047e8, rpcs=0x7fff5fbfe540, rpcs_len=1, read_prefs=0x1007069d0, error=0x102808230) at mongo-c-driver/src/mongoc/mongoc-cluster.c:1623
#11 mongoc_cluster_sendv (cluster=0x1007047e8, rpcs=0x7fff5fbfe540, rpcs_len=1, write_concern=0x1007056d0, read_prefs=0x1007069d0, error=0x102808230) at mongo-c-driver/src/mongoc/mongoc-cluster.c:2309
#12 _mongoc_client_sendv (client=0x1007047d0, rpcs=0x7fff5fbfe540, rpcs_len=1, server_id=0, write_concern=0x0, read_prefs=0x1007069d0, error=0x102808230) at mongo-c-driver/src/mongoc/mongoc-client.c:448
#13 _mongoc_cursor_query (cursor=0x102808000) at mongo-c-driver/src/mongoc/mongoc-cursor.c:500
#14 _mongoc_cursor_next (cursor=0x102808000, bson=0x7fff5fbfe658) at mongo-c-driver/src/mongoc/mongoc-cursor.c:819
#15 mongoc_cursor_next (cursor=0x102808000, bson=0x7fff5fbfe658) at mongo-c-driver/src/mongoc/mongoc-cursor.c:747
#16 _mongoc_client_command_simple_with_hint (client=0x1007047d0, db_name=0x10009754f "test", command=0x100706950, read_prefs=0x100706850, reply=0x7fff5fbfeb00, hint=0, error=0x7fff5fbfe900) at mongo-c-driver/src/mongoc/mongoc-client.c:1257
#17 mongoc_client_command_simple (client=0x1007047d0, db_name=0x10009754f "test", command=0x100706950, read_prefs=0x100706850, reply=0x7fff5fbfeb00, error=0x7fff5fbfe900) at mongo-c-driver/src/mongoc/mongoc-client.c:1231
#18 test_seed_list (rs=true, connection_option=CONNECT, pooled=false) at mongo-c-driver/tests/test-mongoc-client.c:639
#19 test_rs_seeds_connect_single () at mongo-c-driver/tests/test-mongoc-client.c:725
#20 TestSuite_AddHelper (cb_=0x10001d070 <test_rs_seeds_connect_single>) at mongo-c-driver/tests/TestSuite.c:301
#21 TestSuite_RunTest (suite=0x7fff5fbff530, test=0x1005065d0, mutex=0x7fff5fbff4a8, count=0x7fff5fbff404) at mongo-c-driver/tests/TestSuite.c:429
#22 TestSuite_RunNamed (suite=0x7fff5fbff530, testname=0x100505770 "/Client/*") at mongo-c-driver/tests/TestSuite.c:741
#23 TestSuite_Run (suite=0x7fff5fbff530) at mongo-c-driver/tests/TestSuite.c:769
#24 main (argc=5, argv=0x7fff5fbff590) at mongo-c-driver/tests/test-libmongoc.c:931

When the server hangs up on an ismaster call, it seems that mongoc_stream_failed is called with a stream that's uninitialized or already destroyed.



 Comments   
Comment by Githook User [ 11/Jan/16 ]

Author:

{u'username': u'ajdavis', u'name': u'A. Jesse Jiryu Davis', u'email': u'jesse@mongodb.com'}

Message: CDRIVER-789 remove topology scanner's "seen" list
Branch: 1.3.0-dev
https://github.com/mongodb/mongo-c-driver/commit/3c3019bef2a7fe062094cbe814a2c90945cd0f8f

Comment by Githook User [ 11/Jan/16 ]

Author:

{u'username': u'ajdavis', u'name': u'A. Jesse Jiryu Davis', u'email': u'jesse@mongodb.com'}

Message: CDRIVER-789 crash on network err from removed node

If the scanner knows about A, a primary, and B, a secondary:

  • B is removed from the replica set and shut down
  • The scanner begins, launching async commands to check A and B
  • A responds first and tells the scanner to remove B
  • mongoc_topology_reconcile removes B's mongoc_topology_scanner_node_t and
    destroys it, stream and all
  • But the mongoc_async_cmd_t for B is still active in the scanner, with the
    same stream.
  • B's mongoc_async_cmd_t fails with a network error, and passes the error into
    the command callback.
  • The command callback sees the error and destroys the same stream again,
    crashing.

The solution is to not destroy removed nodes in mongoc_topology_reconcile, but
close their streams, and mark them "retired" for the remainder of the
scan. At the end of the scan destroy retired nodes.
Branch: 1.3.0-dev
https://github.com/mongodb/mongo-c-driver/commit/9b294e0fa2496c3c6c59552bf4601dbd0581a549

Comment by Githook User [ 11/Jan/16 ]

Author:

{u'username': u'ajdavis', u'name': u'A. Jesse Jiryu Davis', u'email': u'jesse@mongodb.com'}

Message: CDRIVER-789 re-enable topology tests
Branch: 1.3.0-dev
https://github.com/mongodb/mongo-c-driver/commit/5b147145b3b787460bd30f224fda46a3cac60b38

Comment by Githook User [ 01/Sep/15 ]

Author:

{u'username': u'ajdavis', u'name': u'A. Jesse Jiryu Davis', u'email': u'jesse@mongodb.com'}

Message: CDRIVER-789 remove topology scanner's "seen" list
Branch: 1.2.0-dev
https://github.com/mongodb/mongo-c-driver/commit/3c3019bef2a7fe062094cbe814a2c90945cd0f8f

Comment by Githook User [ 01/Sep/15 ]

Author:

{u'username': u'ajdavis', u'name': u'A. Jesse Jiryu Davis', u'email': u'jesse@mongodb.com'}

Message: CDRIVER-789 crash on network err from removed node

If the scanner knows about A, a primary, and B, a secondary:

  • B is removed from the replica set and shut down
  • The scanner begins, launching async commands to check A and B
  • A responds first and tells the scanner to remove B
  • mongoc_topology_reconcile removes B's mongoc_topology_scanner_node_t and
    destroys it, stream and all
  • But the mongoc_async_cmd_t for B is still active in the scanner, with the
    same stream.
  • B's mongoc_async_cmd_t fails with a network error, and passes the error into
    the command callback.
  • The command callback sees the error and destroys the same stream again,
    crashing.

The solution is to not destroy removed nodes in mongoc_topology_reconcile, but
close their streams, and mark them "retired" for the remainder of the
scan. At the end of the scan destroy retired nodes.
Branch: 1.2.0-dev
https://github.com/mongodb/mongo-c-driver/commit/9b294e0fa2496c3c6c59552bf4601dbd0581a549

Comment by Githook User [ 01/Sep/15 ]

Author:

{u'username': u'ajdavis', u'name': u'A. Jesse Jiryu Davis', u'email': u'jesse@mongodb.com'}

Message: CDRIVER-789 re-enable topology tests
Branch: 1.2.0-dev
https://github.com/mongodb/mongo-c-driver/commit/5b147145b3b787460bd30f224fda46a3cac60b38

Comment by A. Jesse Jiryu Davis [ 12/Aug/15 ]

If the scanner knows about A, a primary, and B, a secondary:

  • B is removed from the replica set and shut down
  • The scanner begins, launching async commands to check A and B
  • A responds first and tells the scanner to remove B
  • mongoc_topology_reconcile removes B's mongoc_topology_scanner_node_t and destroys it, stream and all
  • But the mongoc_async_cmd_t for B is still active in the scanner, with the same stream.
  • B's mongoc_async_cmd_t fails with a network error, and passes the error into the command callback.
  • The command callback sees the error and destroys the same stream again, crashing.

The solution is to not destroy removed nodes in mongoc_topology_reconcile, but close their streams, and mark them "retired" for the remainder of the scan. At the end of the scan destroy retired nodes.

Comment by Githook User [ 10/Aug/15 ]

Author:

{u'username': u'ajdavis', u'name': u'A. Jesse Jiryu Davis', u'email': u'jesse@mongodb.com'}

Message: CDRIVER-789 disable two more tests for now
Branch: 1.2.0-dev
https://github.com/mongodb/mongo-c-driver/commit/4d07a4650492d9f14ea604c68f7c6690e24de80a

Comment by Githook User [ 10/Aug/15 ]

Author:

{u'username': u'ajdavis', u'name': u'A. Jesse Jiryu Davis', u'email': u'jesse@mongodb.com'}

Message: forward-port CDRIVER-721 tests to 1.2.0-dev

Discovered CDRIVER-789 from these tests.
Branch: 1.2.0-dev
https://github.com/mongodb/mongo-c-driver/commit/9e58464e494bd70f40c7df193a2da5f03445b1f8

Generated at Wed Feb 07 21:10:38 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.