Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Duplicate
Priority: Major - P3
Fix Version/s: None
Affects Version/s: 3.2.10
Component/s: Querying, Sharding
Labels:
None

Backwards Compatibility:
Fully Compatible
Operating System:
ALL
Sprint:
Sharding 2016-11-21, Sharding 2016-12-12
Case:
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

We have updated one of our sharded cluster from v3.0.12 to v3.2.10. Since then our cluster was not operational because many operations got blocked by the the router. The corresponding log message looks like this:

2016-10-20T11:00:22.902+0200 I ASIO     [NetworkInterfaceASIO-TaskExecutorPool-3-0] Failed to connect to s559:27017 - ExceededTimeLimit: Operation timed out
2016-10-20T11:00:22.918+0200 I ASIO     [NetworkInterfaceASIO-TaskExecutorPool-3-0] Failed to connect to mongo-007.ipx:27017 - ExceededTimeLimit: Operation timed out
2016-10-20T11:00:22.920+0200 I ASIO     [NetworkInterfaceASIO-TaskExecutorPool-3-0] Failed to connect to mongo-024.ipx:27017 - ExceededTimeLimit: Operation timed out
2016-10-20T11:00:22.921+0200 I ASIO     [NetworkInterfaceASIO-TaskExecutorPool-3-0] Failed to connect to mongo-007.ipx:27017 - ExceededTimeLimit: Operation timed out
2016-10-20T11:00:22.921+0200 I ASIO     [NetworkInterfaceASIO-TaskExecutorPool-3-0] Failed to connect to mongo-024.ipx:27017 - ExceededTimeLimit: Operation timed out

We can reproduce the issue at any time just by executing a findOne through the router several times:

for(x=0;x<1000;x++){db.offer.find({"_id" : NumberLong("5672494983")}).forEach(function(u){printjson(u)});print(x)}

It blocks after a few findOne's already.
If we execute the same code on the shard where the document is located then there is no blocking at all.

We found out that mongodb router v3.0.12 doesn't have this problem. This is why we downgraded all our routers to v3.0.12 even though the rest of the cluster (mongod's) is running v3.2.10.

Please see attached the log file from the router.
Please see also 3 monitoring screenshots of the router TCP-sockets. As you can see, tcp_tw (tcp_timeWait) is much higher for v3.2.10 than for v3.0.12.

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

fr-11_tcpwait.jpg
Apr 04 2017 11:07:33 PM UTC
173 kB
Kay Agahd
fr-11_tcpwaitOnly.jpg
Apr 04 2017 11:07:33 PM UTC
169 kB
Kay Agahd
offerstore-en-router-03_afterDowngradeToV3.0.10.png
Oct 21 2016 10:16:33 AM UTC
201 kB
Kay Agahd
offerstore-en-router-03_afterUpgradeToV3.2.10.png
Oct 21 2016 10:16:33 AM UTC
201 kB
Kay Agahd
offerstore-en-router-03_beforeUpgrade.png
Oct 21 2016 10:16:33 AM UTC
202 kB
Kay Agahd
offerstore-en-router-03_testWithV3.2.10.png
Oct 26 2016 08:18:54 AM UTC
232 kB
Kay Agahd
offerstore-en-router-03.ipx.2016-10-20.log.tgz
Oct 21 2016 10:16:32 AM UTC
9.46 MB
Kay Agahd
offerstore-en-router-03.ipx.2016-10-26.log.tgz
Oct 26 2016 08:18:54 AM UTC
3.74 MB
Kay Agahd
Hide
v3.2.11 - 201.zip
Nov 29 2016 05:01:21 PM UTC
102.86 MB
Kaloian Manassiev
Extracting archive...
Show
v3.2.11 - 201.zip
Nov 29 2016 05:01:21 PM UTC
102.86 MB
Kaloian Manassiev
Hide
v3.2.11 - 202.zip
Nov 29 2016 05:01:16 PM UTC
16.96 MB
Kaloian Manassiev
Extracting archive...
Show
v3.2.11 - 202.zip
Nov 29 2016 05:01:16 PM UTC
16.96 MB
Kaloian Manassiev
Hide
v3.2.11 - 203.zip
Nov 29 2016 05:01:18 PM UTC
34.03 MB
Kaloian Manassiev
Extracting archive...
Show
v3.2.11 - 203.zip
Nov 29 2016 05:01:18 PM UTC
34.03 MB
Kaloian Manassiev

duplicates

SERVER-27232 Refresh and Setup timeouts in the ASIO connpool can prematurely time out an operation

Closed

is related to

SERVER-26723 Mongos stalls for even simple queries

Closed

SERVER-26859 AsyncResultsMerger replica set retargeting may block the ASIO callback threads

Closed

Assignee:: Mira Carey
Reporter:: Kay Agahd
Participants:: Alessandro Gherardi, Antonis Giannopoulos, Jon Hyman, Kaloian Manassiev, Kay Agahd, Kelsey Schubert, Mira Carey, Roy Reznik
Votes:: 2 Vote for this issue
Watchers:: 17 Start watching this issue

Created:: Oct 21 2016 10:16:32 AM UTC
Updated:: Jan 08 2024 03:23:09 PM UTC
Resolved:: Dec 08 2016 06:08:38 PM UTC

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates