[CXX-1209] Client from pool throw timout exception Created: 29/Jan/17 Updated: 27/Oct/23 Resolved: 31/Jan/17 |
|
| Status: | Closed |
| Project: | C++ Driver |
| Component/s: | None |
| Affects Version/s: | 3.1.1 |
| Fix Version/s: | None |
| Type: | Task | Priority: | Major - P3 |
| Reporter: | Denis Bip | Assignee: | Unassigned |
| Resolution: | Works as Designed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
Linux, Debia 8, C Driver 1.5.3 |
||
| Issue Links: |
|
||||
| Case: | (copied to CRM) | ||||
| Description |
|
Hi again! Client from mongocxx::pool sometimes throws "Failed to send "insert" command with database "...": socket error or timeout: generic server error". Execution time this whole block code is 0ms, i.e. exception throws immediately...It seems that mongocxx::pool returns timeouted connection..? |
| Comments |
| Comment by David Golden [ 31/Jan/17 ] | |||||||||||||||||||||||||||||||
|
TCP keepalive is below the application layer in the network stack, so the problem could be in a router/firewall/load-balancer. You'll need to talk to your system/network admins to diagnose further. I'm going to close this, as I think you've identified the problem. | |||||||||||||||||||||||||||||||
| Comment by Denis Bip [ 30/Jan/17 ] | |||||||||||||||||||||||||||||||
|
So, yes it is. The problem was in keepAliveTime tcp-sockets settings. Does it mean that client-driver side do not responce to server sended ACK packet? Or server may not send ACK packet but simple close iddle connection? | |||||||||||||||||||||||||||||||
| Comment by Denis Bip [ 30/Jan/17 ] | |||||||||||||||||||||||||||||||
|
As an idea I think that problem may be with tunning short keepAlive time at the server machine. It is set at 30 sec to iddle, then each 10 sec send control packet to client 5 times | |||||||||||||||||||||||||||||||
| Comment by Denis Bip [ 30/Jan/17 ] | |||||||||||||||||||||||||||||||
|
I dont think that its network broblems, because all other databases and services works very well (processed 500-2000 requests per second). I made log of succecced and failed inserts (bulk_write with 4-10 rows each), Fails begin after 5 minutes running | |||||||||||||||||||||||||||||||
| Comment by David Golden [ 30/Jan/17 ] | |||||||||||||||||||||||||||||||
|
How often is "sometimes"? There is an inherent race condition where just because the last operation with a client succeeded doesn't mean that the next one will, so failures of this sort are always possible – but without failovers or a flaky network, you shouldn't been seeing those regularly. | |||||||||||||||||||||||||||||||
| Comment by Denis Bip [ 29/Jan/17 ] | |||||||||||||||||||||||||||||||
|
Servers versions are: primary - 3.2.9, secondaries (3 servers) and arbiters (3 servers) - 3.4.1. I'm updating servers to 3.4.1 version but not finalize it yet. In primary server log containts only connection/disconnection and long queries information, no errors.
multithread access (~30 threads):
Replicaset settings:
| |||||||||||||||||||||||||||||||
| Comment by David Golden [ 29/Jan/17 ] | |||||||||||||||||||||||||||||||
|
Hello! For this, we really do need an SSCCE that lets us reproduce it. Please also note what server versions you have and give us a sanitized version of your replica set configuration. |