[CDRIVER-2793] Crash observed when destroying client Created: 10/Aug/18  Updated: 30/Aug/18  Resolved: 30/Aug/18

Status: Closed
Project: C Driver
Component/s: None
Affects Version/s: 1.12.0
Fix Version/s: None

Type: Bug Priority: Critical - P2
Reporter: Arun Muralidharan Assignee: A. Jesse Jiryu Davis
Resolution: Cannot Reproduce Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Ubuntu 16 64 bit



 Description   

We have an application which writes records to a remote MongoDB. On observing connection loss to the remote, we destroy the mongo client and create a new one only the remote connection is established back.
Sometime on connection loss I see below crash:

Thread 1 (Thread 0x7fbd181bfac0 (LWP 10006)):
#0  0x00007fbcda37761a in mongoc_stream_get_root_stream () from /usr/lib/libmongoc-1.0.so.0
#1  0x00007fbcda37772f in mongoc_stream_poll () from /usr/lib/libmongoc-1.0.so.0
#2  0x00007fbcda32f626 in mongoc_async_run () from /usr/lib/libmongoc-1.0.so.0
#3  0x00007fbcda38029d in mongoc_topology_scanner_work () from /usr/lib/libmongoc-1.0.so.0
#4  0x00007fbcda379c07 in mongoc_topology_scan_once () from /usr/lib/libmongoc-1.0.so.0
#5  0x00007fbcda379c93 in _mongoc_topology_do_blocking_scan () from /usr/lib/libmongoc-1.0.so.0
#6  0x00007fbcda37a21d in mongoc_topology_select_server_id () from /usr/lib/libmongoc-1.0.so.0
#7  0x00007fbcda33aae4 in _mongoc_client_end_sessions () from /usr/lib/libmongoc-1.0.so.0
#8  0x00007fbcda337353 in mongoc_client_destroy () from /usr/lib/libmongoc-1.0.so.0
#9  0x00007fbcdb3d3bfb in MongoClientOps::destroy_client (this=0x2b456f8) at /localdisk/jenkins-j5-ssh1/spot_debug__ub16/rel_6.1.1c/systemtest/modules/core/rwvx/rwlog/rwlogd/sinkapi/plugin/rwlogd_eventsdb_sink/mongo_client_ops.cpp:294
#10 0x00007fbcdb3c67c6 in EventsDBHASink::stop_sending_events (this=0x2b455f0) at /localdisk/jenkins-j5-ssh1/spot_debug__ub16/rel_6.1.1c/systemtest/modules/core/rwvx/rwlog/rwlogd/sinkapi/plugin/rwlogd_eventsdb_sink/rwlogd_eventsdb_ha_sink.cpp:124

The code for destroying the client is:

void MongoClientOps::destroy_client()
{
  RWMEMLOG (get_memlog_ptr(), RWMEMLOG_MEM2, "Destroy client connection");
 
  if (bulk_) {
    // Destroys mongo bulk in the destructor
    delete bulk_;
    bulk_ = nullptr;
  }
  //collection should be destroyed with the client
  if (collection_) {
    mongoc_collection_destroy(collection_);
    collection_ = nullptr;
  }
  if (database_) {
    mongoc_database_destroy(database_);
    database_ = nullptr;
  }
  if (client_) {
    mongoc_client_destroy(client_);
    client_ = nullptr;
  }
  return;
}

Is there something wrong with my destroy sequence ?



 Comments   
Comment by A. Jesse Jiryu Davis [ 30/Aug/18 ]

Hi, let us know if you can provide the debug-build stack trace, otherwise we'll close the ticket for now.

Comment by A. Jesse Jiryu Davis [ 10/Aug/18 ]

Hi, we don't know of any bugs related to mongoc_client_destroy. Your shutdown sequence looks correct to me. If you rebuild libmongoc with debug symbols enabled (do cmake -DCMAKE_BUILD_TYPE=Debug) and reproduce the bug again you should get a more informative stack trace.

Comment by Arun Muralidharan [ 10/Aug/18 ]

So, the stack trace is not consistent. This is a new backtrace that I have got.

Thread 1 (Thread 0x7f059dc27ac0 (LWP 26395)):
#0  0x00007f053c21c9d0 in ?? ()
#1  0x00007f055bd71b1a in mongoc_async_cmd_run () from /usr/lib/libmongoc-1.0.so.0
#2  0x00007f055bd7177b in mongoc_async_run () from /usr/lib/libmongoc-1.0.so.0
#3  0x00007f055bdc229d in mongoc_topology_scanner_work () from /usr/lib/libmongoc-1.0.so.0
#4  0x00007f055bdbbc07 in mongoc_topology_scan_once () from /usr/lib/libmongoc-1.0.so.0
#5  0x00007f055bdbbc93 in _mongoc_topology_do_blocking_scan () from /usr/lib/libmongoc-1.0.so.0
#6  0x00007f055bdbc21d in mongoc_topology_select_server_id () from /usr/lib/libmongoc-1.0.so.0
#7  0x00007f055bd7cae4 in _mongoc_client_end_sessions () from /usr/lib/libmongoc-1.0.so.0
#8  0x00007f055bd79353 in mongoc_client_destroy () from /usr/lib/libmongoc-1.0.so.0
#9  0x00007f0564fe2bfb in MongoClientOps::destroy_client (this=0x25ac1b8) at /localdisk/jenkins-j8-ssh1/spot_debug__ub16/rel_6.1.1c/systemtest/modules/core/rwvx/rwlog/rwlogd/sinkapi/plugin/rwlogd_eventsdb_sink/mongo_client_ops.cpp:294
#10 0x00007f0564fd57c6 in EventsDBHASink::stop_sending_events (this=0x25ac0b0) at /localdisk/jenkins-j8-ssh1/spot_debug__ub16/rel_6.1.1c/systemtest/modules/core/rwvx/rwlog/rwlogd/sinkapi/plugin/rwlogd_eventsdb_sink/rwlogd_eventsdb_ha_sink.cpp:124

Looks like some corruption happening. Another thing that I forgot to mention is that, the client destroy is not called as soon as the MongoDB on the remote goes down. There might be some delay before client destroy is called.

Comment by Arun Muralidharan [ 10/Aug/18 ]

Connection URI is mongodb://10.64.205.22:8006/?ssl=true&connectTimeoutMS=1000

Generated at Wed Feb 07 21:16:21 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.