[CDRIVER-361] mongoc_collection_update hangs Created: 30/Apr/14  Updated: 06/Jun/14  Resolved: 06/Jun/14

Status: Closed
Project: C Driver
Component/s: None
Affects Version/s: 0.92.2, 0.94.0, 0.94.2
Fix Version/s: 0.96.2

Type: Bug Priority: Major - P3
Reporter: Eric Kapke Assignee: Christian Hergert
Resolution: Done Votes: 0
Labels: driver
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Solaris, hosted by Joyent.



 Description   

I have a about a dozen queries that the application runs. Built, and tested on OS X 10.9.2. All worked fine. Same mongo server, but compile and run on Solaris: SunOS staging-lock1 5.11 joyent_20131218T184706Z i86pc i386 i86pc Solaris, and one of the updates hangs forever.

The call stack is:

(gdb) thread 13
[Switching to thread 13 (Thread 5 (LWP 5))]
#0 0xfffffd7fff225e1a in __so_sendmsg () from /lib/64/libc.so.1
(gdb) bt
#0 0xfffffd7fff225e1a in __so_sendmsg () from /lib/64/libc.so.1
#1 0xfffffd7fff211659 in _so_sendmsg () from /lib/64/libc.so.1
#2 0xfffffd7ffea4069e in __xnet_sendmsg () from /lib/64/libsocket.so.1
#3 0xfffffd7fe7c5fcf6 in _mongoc_socket_try_sendv () from /usr/local/lib/libmongoc-1.0.so.0
#4 0xfffffd7fe7c5fd9b in mongoc_socket_sendv () from /usr/local/lib/libmongoc-1.0.so.0
#5 0xfffffd7fe7c60840 in _mongoc_stream_socket_writev () from /usr/local/lib/libmongoc-1.0.so.0
#6 0xfffffd7fe7c541b5 in _mongoc_cluster_try_sendv () from /usr/local/lib/libmongoc-1.0.so.0
#7 0xfffffd7fe7c62830 in _mongoc_write_command_update_legacy () from /usr/local/lib/libmongoc-1.0.so.0
#8 0xfffffd7fe7c621c6 in _mongoc_write_command_execute () from /usr/local/lib/libmongoc-1.0.so.0
#9 0xfffffd7fe7c55b14 in mongoc_collection_update () from /usr/local/lib/libmongoc-1.0.so.0
#10 0x000000000040cc3f in tdata_store_set_message_status (self=0x4d4f48, message=0x4d7ca8) at tdata_store.c:242
#11 0x0000000000413639 in tmessage_endpoint_send_message_status_to_client (self=0x4c2e20, message=0x4d7ca8, message=0x4d16d0) at tmessage_endpoint.c:101
#12 0x0000000000413db1 in tmessage_endpoint_message_received (status=TMESSAGE_TRANSPORT_MESSAGE, sock_fd=25, connection=0x4d16d0, message=0x4d7ca8, context=0x4c2e20)
at tmessage_endpoint.c:219
#13 0x0000000000414942 in tmessage_transport_fire_message_available (self=0x4eceb0, message=0x4d7ca8, connection=0x4d16d0) at tmessage_transport.c:221
#14 0x0000000000415002 in tmessage_transport_socket_callback_data (tsocket=0x4f2e50, connection=0x4d16d0, context=0x4eceb0) at tmessage_transport.c:405
#15 0x000000000041511c in tmessage_transport_socket_callback (tsocket=0x4f2e50, client_fd=25, connection=0x4d16d0, callback_type=TSOCKET_DATA, context=0x4eceb0)
at tmessage_transport.c:453
#16 0x000000000040db0f in _TSocketServer_fireDataAvailableEvent (self=0x4f2e50, connection=0x4d16d0) at TSocket.c:205
#17 0x000000000040e07f in libev_read_cb (loop=0x4f3c50, watcher=0x4d16d0, revents=1) at TSocket.c:354
#18 0xfffffd7fe98f568d in ev_invoke_pending () from /opt/local/lib/ev/libev.so.4
#19 0xfffffd7fe98f84b1 in ev_run () from /opt/local/lib/ev/libev.so.4
#20 0x000000000040d6f4 in ev_loop (loop=0x4f3c50, flags=0) at ./include/libev/ev.h:826
#21 0x000000000040ea37 in _TSocketServer_Worker (ptr=0x4f2e50) at TSocket.c:614
#22 0xfffffd7fff21f46a in _thrp_setup () from /lib/64/libc.so.1
#23 0xfffffd7fff21f780 in ?? () from /lib/64/libc.so.1
#24 0x0000000000000000 in ?? ()

The code in question is doing a simple upsert with the _id and one field with binary data in it.

int
tdata_store_set_message_status(tdata_store_t *self, TPacketLocal_t *message)
{
int rc = RETURN_UNINITIALIZED;

if(message && message->messagePtr && message->header.messageSize > 0) {
char message_id[UUID_STR_LEN];
uuid_unparse(message->header.messageId, message_id);

TTRACE_DATA_STORE("Setting last message status for message %s (%d bytes)", message_id, message->header.messageSize);

// Update the messages table
bson_t cond;

bson_init(&cond);
BSON_APPEND_UUID(&cond, TDATA_STORE_ID_FIELD, message->header.messageId);

bson_t update;
bson_init(&update);
bson_t child;
bson_append_document_begin(&update, "$set", -1, &child);
BSON_APPEND_BINARY(&child, TDATA_STORE_message_STATUS_FIELD, BSON_SUBTYPE_BINARY, message->messagePtr, message->header.messageSize);
bson_append_document_end(&update, &child);

mongoc_collection_t *collection = tdata_store_message_collection(self);/
bson_error_t error;

size_t length = 0;
char * query_text = bson_as_json(&update, &length);

size_t cond_length = 0;
char * cond_text = bson_as_json(&cond, &cond_length);

TTRACE_DATA_STORE("Updating collection...");
TTRACE_DATA_STORE("QUERY: %s UPDATE: %s", cond_text, query_text);

// Update or insert the message

---> HANGS rc = mongoc_collection_update(collection,
MONGOC_UPDATE_UPSERT,
&cond,
&update,
NULL,
&error);

if(!rc)

{ TTRACE_DATA_STORE("Could not set message message status - %s", error.message); rc = RETURN_ERROR; }

else

{ TTRACE_DATA_STORE("Updated collection..."); rc = RETURN_SUCCESS; }

rc = RETURN_SUCCESS;

bson_destroy(&update);
bson_destroy(&child);
bson_destroy(&cond);

TTRACE_DATA_STORE("Set last status %d", rc);
} else

{ TTRACE_DATA_STORE("No message specified to set"); }

return rc;
}

And the two relevant trace lines are:

[04-30-2014 16:32:02.147803] [TRACE] [DATA_STORE] Setting last message status for message 0bc6d80c-eb65-41f7-8b50-b657f5552043 (179 bytes) [tdata_store.c:tdata_store_set_message_status:213]
[04-30-2014 16:32:02.147889] [TRACE] [DATA_STORE] Updating collection... [tdata_store.c:tdata_store_set_message_status:238]
[04-30-2014 16:32:02.147909] [TRACE] [DATA_STORE] QUERY:

{ "_id" : "0BC6D80C-EB65-41F7-8B50-B657F5552043" }

UPDATE: { "$set" : { "status" :

{ "$type" : "00", "$binary" : "lAHOlyhXOM4ZEV0CgqNzaWfaAEcwRQIhAMfrLai3DaOnq1B/cpHsCVmpX5dpxlHUOMuK+wEJu95NAiBwsSBbq4+ceFe85CpuyeVQwFPDlHYdlP6LZqbIN6vHLqNtc2faAFGUAc443kMHzhkRXQKDo2JhdGSobGNrU3RhdGUEpWxzdE9wg7FuZXRDb25lY3Rpb25TdGF0ZQOrbG9ja2VkU3RhdGUEpGRhdGXLiE1f3UG5EHE=" }

} } [tdata_store.c:tdata_store_set_message_status:239]



 Comments   
Comment by Christian Hergert [ 06/Jun/14 ]

0.96.2 has been released with the various fixes. I'm going to mark this issue as fixed. If you continue to have issues on 0.96.2, please do not hesitate to reopen the issue.

– Christian

Comment by Christian Hergert [ 04/Jun/14 ]

I found another issue that could be related to this. It will go into 0.96.2.

https://github.com/mongodb/mongo-c-driver/commit/310609bf11c70fd50e503635363bb3525a4bd949

Comment by Christian Hergert [ 30/May/14 ]

Any chance I can have you run this test again using mongo-c-driver 0.96.0? There were some late fixes in a few areas that might get us moving in the right direction here. I'd like to dive into this during the next cycle if it's still necessary.

Also, you might consider running ./configure with --enable-tracing.

Comment by Christian Hergert [ 14/May/14 ]

Do you think you could add a test case for this to tests/test-mongoc-client.c and submit a pull request?

Comment by Eric Kapke [ 14/May/14 ]

That's correct - I was purposely trying to use it from a single thread. The main thread just initialized it - before the i/o thread was even created, then never touches the mongoc_client again. I probably overlooked the fact that the init happened on the main thread. It did work on OS X the way I had it, but just not on solaris.

Comment by Christian Hergert [ 14/May/14 ]

Absolutely. The MongoDB driver is not thread-safe, it is thread-unaware. That means that you are responsible for providing the thread-safety.

However, there is one place were the C driver currently deals with threads, and that is mongoc_client_pool_t. That can be used in a thread-safe manner in situations such as web servers where you need to check a client out of the pool to handle a request. That uses pthread_mutex_t/pthread_cond_t internally (on UNIX-like systems).

Now, if you created the client in one thread and passed it to another thread, meanwhile doing nothing with it in the main thread, that is something we should look at.

Comment by Eric Kapke [ 14/May/14 ]

I think the issue is not related to the query itself, rather a thread issue.

1. I made a tool that runs the same query (single thread) runs just fine.
2. I refactored my code to use a queue with a dedicated thread where I the mongo client is initialized, and all mongo operations are run from. Works just fine.
3. In the stack trace above, the only difference seems to be the mongoc_client was initialized on the main thread, while db operations are run from a different I/O thread.

Perhaps its just user error?

Comment by Christian Hergert [ 01/May/14 ]

Hi Eric, thanks for reporting this.

Can you let me know what the version of the driver was used in the stack trace above? Based on your specified versions I assume you tried all of them?

Do you have an idea of the size of upsert in bytes?

Also, if you recompile with --enable-debug-symbols then we can get the number of iovec's that were being written in mongoc_socket_try_sendv() in the backtrace. That is interesting to me because I've seen issues with scatter/gather IO on Solaris in the past with some previous iterations of the socket layer.

Generated at Wed Feb 07 21:09:15 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.