[CDRIVER-4168] mongoc driver keeps reconnecting to the server ? Created: 20/Sep/21  Updated: 10/Jan/22  Resolved: 10/Jan/22

Status: Closed
Project: C Driver
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Unknown
Reporter: G F Assignee: Kevin Albertson
Resolution: Cannot Reproduce Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: Text File example-update.c     Text File example-update1.c     Text File example-update_socket_leak.c    

 Description   

wih latest mongo c lib 19.1 and mongodb community 5.0.2 I find a LOT of sockets in timed_wait (client is on windows 7) 

there appears to me to be constant needless socket allocation to a stable mongodb

c:\temp\mongo-c-driver-1.19.0\mongo-c-driver-1.19.0\src\libmongoc\src\mongoc\mongoc-cluster.c

mongoc_cluster_fetch_stream_pooled

cluster_node = _mongoc_cluster_add_node (cluster, server_id, error);

 

I obtain different clients from a single mongoc_client_pool_t

in the connection url I put these as high as possible but there doesn't seem to be considering them, it seems to allocate sockets irrespectively 

heartbeatfrequencyms=1000000

socketCheckIntervalMS=1000000

what could I do to stop it leaking sockets ?

 

 



 Comments   
Comment by Kevin Albertson [ 10/Jan/22 ]

Closing as "Cannot Reproduce" since I am unable to reproduce the bug of opening and closing sockets as described in the ticket description.

Comment by G F [ 30/Dec/21 ]

forgive me for a "philosophical" idle speculation, hmm in this server monitoring and discovery SDAM spec https://github.com/mongodb/specifications/blob/master/source/server-discovery-and-monitoring/server-discovery-and-monitoring-monitoring.rst

in completely abstract terms, there's nothing specifying that a collection on client/server side shouldn't be both readable/find and writeable/bulk execute hmm (obviously I can only think of client side, I looked at the server side "cursorily" and it's mind boggling with C++ futures no idea)

this actually doesn't help much, but it's my internal reasoning if of any use say possibly to multiplex both reads and writes on a single client/collection say via mongodb wire protocol with requestid/response to, since the tcp socket is bidirectional ?

how does the record locking work on server side in case of one client reading and another writing ? any clue whatsoever ?

Comment by Kevin Albertson [ 29/Dec/21 ]

could it be thinkable in your future modifications to allow concurrent ?

CDRIVER-3525 is a proposal rework how connections are shared on clients to make it consistent with other language drivers. There are related open questions in that description: "Do we only make a mongoc_client_t thread safe? Or a mongoc_database_t/mongoc_collection_t as well.". I suggest watching that ticket for updates.

in order to say reduce record contention server side if both querying and writing to same record/doc, i.e. instead of from different clients I'd guess there would be less contention from a single client.

CDRIVER-4002 addresses several related issues with contention on shared structures within mongoc_client_pool_t. If you are seeing high contention, consider upgrading to 1.20.0.

Comment by G F [ 29/Dec/21 ]

concurrent find and bulk execute work with separate pooled clients, could it be thinkable in your future modifications to allow concurrent ? the keep alive threads seem to modify a shared "cluster" structure ? in order to say reduce record contention server side if both querying and writing to same record/doc, i.e. instead of from different clients I'd guess there would be less contention from a single client.

 

Comment by Kevin Albertson [ 29/Dec/21 ]

is it thinkable that  mongoc_collection_find(), leaves the cursor unexhausted, whereas mongoc_collection_find_with_opts() exhausts it ?

mongoc_collection_find and mongoc_collection_find_with_opts share much of the code path with _mongoc_cursor_find_new. If there is a difference, my hypothesis is that there is a difference in options passed between the two calls.

Also another information/clue ; can these finds be run on a collection, which is executing bulk writes in parallel in another thread ? it doesn't seem to be possible.

As long as the other thread is using a collection obtained from a separate client, it should be. Do you observe an error or missing results?

Comment by G F [ 29/Dec/21 ]

Hi, forgive my stupidity, is it thinkable that  mongoc_collection_find(), leaves the cursor unexhausted, whereas mongoc_collection_find_with_opts() exhausts it ?

Also another information/clue ; can these finds be run on a collection, which is executing bulk writes in parallel in another thread ? it doesn't seem to be possible.

Comment by Kevin Albertson [ 29/Dec/21 ]

Thank you for including the reproducible example g.fanini@gmail.com.

Building and run example-update_socket_leak.c with C driver 1.19.0 and a 5.0.2 community MongoDB server. I did not observe leaking sockets.

Using netstat -q | Select-String -Pattern "27017" I only see 3 client sockets established while the example is running. Two sockets are created for monitoring, and one for application operations. After the process finished, I see the 3 client sockets in the TIMED_WAIT state.

In case it is helpful: my initial hypothesis was this was related to the use of MONGOC_QUERY_EXHAUST. If an exhaust cursor does is not fully iterated, the socket for the exhaust cursor is closed since it is no longer usable. That is not the case in example-update_socket_leak.c. It repeatedly calls mongoc_cursor_next until no results are returned. Consider removing the MONGOC_QUERY_EXHAUST flag to see if the problem persists.

Comment by G F [ 21/Sep/21 ]

this is the call stack of what I suspect might be runaway reconnections during an apparently "peaceful" query cursor iteration :

 

mongoc-1.0.dll!mongoc_socket_new(int domain=0x00000002, int type=0x00000001, int protocol=0x00000000)  Line 1032 mongoc-1.0.dll!mongoc_socket_new(int domain=0x00000002, int type=0x00000001, int protocol=0x00000000)  Line 1032
mongoc-1.0.dll!mongoc_client_connect_tcp(int connecttimeoutms=0x00002710, const _mongoc_host_list_t * host=0x019d32b0, _bson_error_t * error=0x01990378)  Line 688 + 0x11 bytes
{{mongoc-1.0.dll!mongoc_client_connect(char buffered='', char use_ssl=0x00, void * ssl_opts_void=0x01994230, const _mongoc_uri_t * uri=0x00002710, const _mongoc_host_list_t * host=0x019d32b0, _bson_error_t * error=0x01990378)  Line 822 + 0xc bytes }}
{{mongoc-1.0.dll!mongoc_client_default_stream_initiator(const _mongoc_uri_t * uri=0x019a09d8, const _mongoc_host_list_t * host=0x019d32b0, void * user_data=0x01994100, _bson_error_t * error=0x01990378)  Line 914 + 0x23 bytes }}
{{mongoc-1.0.dll!_mongoc_client_create_stream(_mongoc_client_t * client=0x019941e0, const _mongoc_host_list_t * host=0x019d32b0, _bson_error_t * error=0x01990378)  Line 948 + 0x17 bytes }}
{{mongoc-1.0.dll!_mongoc_cluster_add_node(_mongoc_cluster_t * cluster=0x00000000, unsigned int server_id=0x00000001, void * error=0x62c79d83)  Line 2150 + 0xf0 bytes }}
{{mongoc-1.0.dll!mongoc_cluster_fetch_stream_pooled(_mongoc_cluster_t * cluster=0x019941e8, unsigned int server_id=0x00000001, char reconnect_ok='', _bson_error_t * error=0x01990378)  Line 2657 + 0x10 bytes }}
{{mongoc-1.0.dll!_mongoc_cluster_stream_for_server(_mongoc_cluster_t * cluster=0x019941e8, unsigned int server_id=0x00000001, char reconnect_ok='', const _mongoc_client_session_t * cs=0x00000000, _bson_t * reply=0x1202f380, _bson_error_t * error=0x01990378)  Line 2268 + 0xb bytes }}
{{mongoc-1.0.dll!_mongoc_cluster_stream_for_optype(_mongoc_cluster_t * cluster=0x019941e8, mongoc_ss_optype_t optype=MONGOC_SS_READ, const _mongoc_read_prefs_t * read_prefs=0x0199a650, _mongoc_client_session_t * cs=0x00000000, _bson_t * reply=0x1202f380, _bson_error_t * error=0x01990378)  Line 2846 + 0x14 bytes }}
{{mongoc-1.0.dll!mongoc_cluster_stream_for_reads(_mongoc_cluster_t * cluster=0x019941e8, const _mongoc_read_prefs_t * read_prefs=0x0199a650, _mongoc_client_session_t * cs=0x00000000, _bson_t * reply=0x1202f380, _bson_error_t * error=0x01990378)  Line 2884 + 0x19 bytes }}
{{mongoc-1.0.dll!_mongoc_cursor_fetch_stream(_mongoc_cursor_t * cursor=0x01990250)  Line 669 + 0x28 bytes }}
{{mongoc-1.0.dll!_prime(_mongoc_cursor_t * cursor=0x01990250)  Line 40 + 0x6 bytes }}
{{mongoc-1.0.dll!mongoc_cursor_next(_mongoc_cursor_t * cursor=0x01990250, const _bson_t * * bson=0x1202f568)  Line 1287 + 0x2a bytes }}
Na192.exe!MongoDbGetRecord(AXRCB_t * lprcb=0x01d42894, char * dst=0x1202fb10, int * mongolen=0x1202f76c, _mongoc_collection_t * collection=0x0198ed88)  Line 2349 + 0xc bytes }}{{

 

Comment by G F [ 21/Sep/21 ]

I attached one of the driver's examples example-update.c that I modified with a #define to use pool, with a single thread I cannot reproduce the socket leak that I seem to be getting elsewhere with multiple threads, (I make sure of that with a breakpoint in mongoc_socket_new() ), that I will try next

 

I attach a multiple (2) thread very silly test example-update1.c that also doesn't seem to leak, I studied more carefully where I'm getting leaks there doesn't seem to be anything involving allocations like a new pool from uri, it's all with a fixed single pool and fixed threads with client and identical collection from that pool.

Is it thinkable there might be some circumstances which lead the (one or more ?) monitoring threads of lib C api to somehow reconnect ? is there any way to find out what may be tripping a reconnection, the mongodb is not replica it's standalone that I understand that may cause reconnections with it not being "primary" ?

 

 

As far as I can tell, it seems to happen iterating the cursor obtained from (deprecated) mongoc_collection_find(), it doesn't do it with mongoc_collection_find_with_opts(), this solves the leak empirically in my program, I enclosed example-update_socket_leak.c, that it's nothing to do with multiple threads I suspect, however this won't reproduce leaking sockets oddly enough !? 

 

 

 

Comment by G F [ 21/Sep/21 ]

Hi I'm not trying to be silly, on the client side I am swamped by about 8k sockets in timed_wait on windows, but it varies, on the mongo db community 5.0.2 side I have a noticeable slow down due to these logs :

{"t":\{"$date":"2021-09-21T05:26:58.706+02:00"}

,"s":"I", "c":"NETWORK", "id":22943, "ctx":"listener","msg":"Connection accepted","attr":{"remote":"10.157.131.139:58778","uuid":"a8affb81-1d60-455f-b134-ddebe4377f3f","connectionId":210768,"connectionCount":32}}

{"t":\{"$date":"2021-09-21T05:26:58.720+02:00"}

,"s":"I", "c":"NETWORK", "id":51800, "ctx":"conn210768","msg":"client metadata","attr":{"remote":"10.157.131.139:58778","client":"conn210768","doc":{"driver":

{"name":"mongoc","version":"1.19.0"}

,"os":{"type":"Windows","name":"Windows","version":"6.1 (7601)","architecture":"x86"},"platform":"cfg=0x0004170063 CC=MSVC 1500 CFLAGS=\"/DWIN32 /D_WINDOWS /W3\" LDFLAGS=\"/machine:X86\""}}}

{"t":\{"$date":"2021-09-21T05:26:58.721+02:00"}

,"s":"I", "c":"NETWORK", "id":22944, "ctx":"conn210768","msg":"Connection ended","attr":{"remote":"10.157.131.139:58778","uuid":"a8affb81-1d60-455f-b134-ddebe4377f3f","connectionId":210768,"connectionCount":31}}

{"t":\{"$date":"2021-09-21T05:26:58.722+02:00"}

,"s":"I", "c":"NETWORK", "id":22943, "ctx":"listener","msg":"Connection accepted","attr":{"remote":"10.157.131.139:58779","uuid":"7ea1c344-aacb-4f99-a487-8ae669030b75","connectionId":210769,"connectionCount":32}}

{"t":\{"$date":"2021-09-21T05:26:58.735+02:00"}

,"s":"I", "c":"NETWORK", "id":51800, "ctx":"conn210769","msg":"client metadata","attr":{"remote":"10.157.131.139:58779","client":"conn210769","doc":{"driver":

{"name":"mongoc","version":"1.19.0"}

,"os":{"type":"Windows","name":"Windows","version":"6.1 (7601)","architecture":"x86"},"platform":"cfg=0x0004170063 CC=MSVC 1500 CFLAGS=\"/DWIN32 /D_WINDOWS /W3\" LDFLAGS=\"/machine:X86\""}}}

{"t":\{"$date":"2021-09-21T05:26:58.736+02:00"}

,"s":"I", "c":"NETWORK", "id":22944, "ctx":"conn210769","msg":"Connection ended","attr":{"remote":"10.157.131.139:58779","uuid":"7ea1c344-aacb-4f99-a487-8ae669030b75","connectionId":210769,"connectionCount":31}}

Comment by Kevin Albertson [ 21/Sep/21 ]

Hi g.fanini@gmail.com, thank you for your report, we will look into this soon!

Generated at Wed Feb 07 21:20:08 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.