[CDRIVER-361] mongoc_collection_update hangs Created: 30/Apr/14 Updated: 06/Jun/14 Resolved: 06/Jun/14 |
|
| Status: | Closed |
| Project: | C Driver |
| Component/s: | None |
| Affects Version/s: | 0.92.2, 0.94.0, 0.94.2 |
| Fix Version/s: | 0.96.2 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Eric Kapke | Assignee: | Christian Hergert |
| Resolution: | Done | Votes: | 0 |
| Labels: | driver | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
Solaris, hosted by Joyent. |
||
| Description |
|
I have a about a dozen queries that the application runs. Built, and tested on OS X 10.9.2. All worked fine. Same mongo server, but compile and run on Solaris: SunOS staging-lock1 5.11 joyent_20131218T184706Z i86pc i386 i86pc Solaris, and one of the updates hangs forever. The call stack is: (gdb) thread 13 The code in question is doing a simple upsert with the _id and one field with binary data in it. int if(message && message->messagePtr && message->header.messageSize > 0) { TTRACE_DATA_STORE("Setting last message status for message %s (%d bytes)", message_id, message->header.messageSize); // Update the messages table bson_init(&cond); bson_t update; mongoc_collection_t *collection = tdata_store_message_collection(self);/ size_t length = 0; size_t cond_length = 0; TTRACE_DATA_STORE("Updating collection..."); // Update or insert the message ---> HANGS rc = mongoc_collection_update(collection, if(!rc) { TTRACE_DATA_STORE("Could not set message message status - %s", error.message); rc = RETURN_ERROR; }else { TTRACE_DATA_STORE("Updated collection..."); rc = RETURN_SUCCESS; }rc = RETURN_SUCCESS; bson_destroy(&update); TTRACE_DATA_STORE("Set last status %d", rc); return rc; And the two relevant trace lines are: [04-30-2014 16:32:02.147803] [TRACE] [DATA_STORE] Setting last message status for message 0bc6d80c-eb65-41f7-8b50-b657f5552043 (179 bytes) [tdata_store.c:tdata_store_set_message_status:213] UPDATE: { "$set" : { "status" : { "$type" : "00", "$binary" : "lAHOlyhXOM4ZEV0CgqNzaWfaAEcwRQIhAMfrLai3DaOnq1B/cpHsCVmpX5dpxlHUOMuK+wEJu95NAiBwsSBbq4+ceFe85CpuyeVQwFPDlHYdlP6LZqbIN6vHLqNtc2faAFGUAc443kMHzhkRXQKDo2JhdGSobGNrU3RhdGUEpWxzdE9wg7FuZXRDb25lY3Rpb25TdGF0ZQOrbG9ja2VkU3RhdGUEpGRhdGXLiE1f3UG5EHE=" }} } [tdata_store.c:tdata_store_set_message_status:239] |
| Comments |
| Comment by Christian Hergert [ 06/Jun/14 ] |
|
0.96.2 has been released with the various fixes. I'm going to mark this issue as fixed. If you continue to have issues on 0.96.2, please do not hesitate to reopen the issue. – Christian |
| Comment by Christian Hergert [ 04/Jun/14 ] |
|
I found another issue that could be related to this. It will go into 0.96.2. https://github.com/mongodb/mongo-c-driver/commit/310609bf11c70fd50e503635363bb3525a4bd949 |
| Comment by Christian Hergert [ 30/May/14 ] |
|
Any chance I can have you run this test again using mongo-c-driver 0.96.0? There were some late fixes in a few areas that might get us moving in the right direction here. I'd like to dive into this during the next cycle if it's still necessary. Also, you might consider running ./configure with --enable-tracing. |
| Comment by Christian Hergert [ 14/May/14 ] |
|
Do you think you could add a test case for this to tests/test-mongoc-client.c and submit a pull request? |
| Comment by Eric Kapke [ 14/May/14 ] |
|
That's correct - I was purposely trying to use it from a single thread. The main thread just initialized it - before the i/o thread was even created, then never touches the mongoc_client again. I probably overlooked the fact that the init happened on the main thread. It did work on OS X the way I had it, but just not on solaris. |
| Comment by Christian Hergert [ 14/May/14 ] |
|
Absolutely. The MongoDB driver is not thread-safe, it is thread-unaware. That means that you are responsible for providing the thread-safety. However, there is one place were the C driver currently deals with threads, and that is mongoc_client_pool_t. That can be used in a thread-safe manner in situations such as web servers where you need to check a client out of the pool to handle a request. That uses pthread_mutex_t/pthread_cond_t internally (on UNIX-like systems). Now, if you created the client in one thread and passed it to another thread, meanwhile doing nothing with it in the main thread, that is something we should look at. |
| Comment by Eric Kapke [ 14/May/14 ] |
|
I think the issue is not related to the query itself, rather a thread issue. 1. I made a tool that runs the same query (single thread) runs just fine. Perhaps its just user error? |
| Comment by Christian Hergert [ 01/May/14 ] |
|
Hi Eric, thanks for reporting this. Can you let me know what the version of the driver was used in the stack trace above? Based on your specified versions I assume you tried all of them? Do you have an idea of the size of upsert in bytes? Also, if you recompile with --enable-debug-symbols then we can get the number of iovec's that were being written in mongoc_socket_try_sendv() in the backtrace. That is interesting to me because I've seen issues with scatter/gather IO on Solaris in the past with some previous iterations of the socket layer. |