[SERVER-21293] Loading config data when CSRS member is blackholed gets mongos stuck Created: 04/Nov/15  Updated: 05/Feb/16  Resolved: 17/Nov/15

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 3.2.0-rc2
Fix Version/s: 3.2.0-rc4

Type: Bug Priority: Major - P3
Reporter: Max Hirschhorn Assignee: Kaloian Manassiev
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File blackhole_first_config_server_from_mongos.js     Text File stacks.log     Text File test_output.log    
Issue Links:
Duplicate
is duplicated by SERVER-21400 Insert takes 90 seconds when first co... Closed
Related
is related to SERVER-22486 query to the router never done when t... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Steps To Reproduce:

python buildscripts/resmoke.py --executor sharding blackhole_first_config_server_from_mongos.js

Sprint: Sharding C (11/20/15)
Participants:

 Description   

When the following sequence of events happen, mongos seems to retry reading the chunk data infinitely from the first config server:
1. Initialize the sharded cluster
2. Enable sharding on the "test" database
3. Create a sharded collection called "server16691" with shard key {_id: 1}
4. Configure the mongobridge corresponding to the first config server to discard messages from the mongos
5. Configure the mongobridges corresponding to the other config servers to delay messages from the mongos
6. Run the "flushRouterConfig" command to clear the routing table of the mongos
7. Try to run db.server16691.find({}).itcount() to trigger read operations of the chunk data

[js_test:blackhole_first_config_server_from_mongos] 2015-11-04T14:35:57.238-0500 b20014| 2015-11-04T14:35:57.229-0500 I -        [thread2] Received "discardMessagesFrom" command with arguments { discardMessagesFrom: 1.0, $forBridge: true, host: "hanamizu:20020", loss: 1.0 } from <unknown>
[js_test:blackhole_first_config_server_from_mongos] 2015-11-04T14:35:57.240-0500 b20014| 2015-11-04T14:35:57.231-0500 I -        [thread12] Received "find" command with arguments { find: "databases", filter: { _id: "test" }, readConcern: { level: "majority", afterOpTime: { ts: Timestamp 1446665757000|20, t: 1 } }, limit: 1, maxTimeMS: 30000 } from hanamizu:20020
[js_test:blackhole_first_config_server_from_mongos] 2015-11-04T14:35:57.240-0500 b20014| 2015-11-04T14:35:57.231-0500 I -        [thread12] Discarding "find" command with arguments { find: "databases", filter: { _id: "test" }, readConcern: { level: "majority", afterOpTime: { ts: Timestamp 1446665757000|20, t: 1 } }, limit: 1, maxTimeMS: 30000 } from hanamizu:20020
[js_test:blackhole_first_config_server_from_mongos] 2015-11-04T14:36:07.007-0500 b20014| 2015-11-04T14:36:07.007-0500 I -        [thread24] Received "isMaster" command with arguments { isMaster: 1, hostInfo: "hanamizu:20020" } from hanamizu:20020
[js_test:blackhole_first_config_server_from_mongos] 2015-11-04T14:36:07.008-0500 b20014| 2015-11-04T14:36:07.007-0500 I -        [thread24] Discarding "isMaster" command with arguments { isMaster: 1, hostInfo: "hanamizu:20020" } from hanamizu:20020
[js_test:blackhole_first_config_server_from_mongos] 2015-11-04T14:36:13.883-0500 b20014| 2015-11-04T14:36:13.883-0500 I -        [thread11] Received "ismaster" command with arguments { ismaster: 1 } from hanamizu:20020
[js_test:blackhole_first_config_server_from_mongos] 2015-11-04T14:36:13.883-0500 b20014| 2015-11-04T14:36:13.883-0500 I -        [thread11] Discarding "ismaster" command with arguments { ismaster: 1 } from hanamizu:20020
[js_test:blackhole_first_config_server_from_mongos] 2015-11-04T14:36:26.807-0500 b20014| 2015-11-04T14:36:26.807-0500 I -        [thread25] Received "isMaster" command with arguments { isMaster: 1, hostInfo: "hanamizu:20020" } from hanamizu:20020
[js_test:blackhole_first_config_server_from_mongos] 2015-11-04T14:36:26.808-0500 b20014| 2015-11-04T14:36:26.807-0500 I -        [thread25] Discarding "isMaster" command with arguments { isMaster: 1, hostInfo: "hanamizu:20020" } from hanamizu:20020
[js_test:blackhole_first_config_server_from_mongos] 2015-11-04T14:36:41.806-0500 b20014| 2015-11-04T14:36:41.806-0500 I -        [thread26] Received "isMaster" command with arguments { isMaster: 1, hostInfo: "hanamizu:20020" } from hanamizu:20020
[js_test:blackhole_first_config_server_from_mongos] 2015-11-04T14:36:41.807-0500 b20014| 2015-11-04T14:36:41.806-0500 I -        [thread26] Discarding "isMaster" command with arguments { isMaster: 1, hostInfo: "hanamizu:20020" } from hanamizu:20020
[js_test:blackhole_first_config_server_from_mongos] 2015-11-04T14:36:57.008-0500 b20014| 2015-11-04T14:36:57.007-0500 I -        [thread27] Received "isMaster" command with arguments { isMaster: 1, hostInfo: "hanamizu:20020" } from hanamizu:20020
[js_test:blackhole_first_config_server_from_mongos] 2015-11-04T14:36:57.008-0500 b20014| 2015-11-04T14:36:57.007-0500 I -        [thread27] Discarding "isMaster" command with arguments { isMaster: 1, hostInfo: "hanamizu:20020" } from hanamizu:20020
...
[js_test:blackhole_first_config_server_from_mongos] 2015-11-04T14:38:57.608-0500 b20014| 2015-11-04T14:38:57.607-0500 I -        [thread38] Received "isMaster" command with arguments { isMaster: 1, hostInfo: "hanamizu:20020" } from hanamizu:20020
[js_test:blackhole_first_config_server_from_mongos] 2015-11-04T14:38:57.608-0500 b20014| 2015-11-04T14:38:57.607-0500 I -        [thread38] Discarding "isMaster" command with arguments { isMaster: 1, hostInfo: "hanamizu:20020" } from hanamizu:20020
[js_test:blackhole_first_config_server_from_mongos] 2015-11-04T14:39:12.808-0500 b20014| 2015-11-04T14:39:12.808-0500 I -        [thread39] Received "isMaster" command with arguments { isMaster: 1, hostInfo: "hanamizu:20020" } from hanamizu:20020
[js_test:blackhole_first_config_server_from_mongos] 2015-11-04T14:39:12.808-0500 b20014| 2015-11-04T14:39:12.808-0500 I -        [thread39] Discarding "isMaster" command with arguments { isMaster: 1, hostInfo: "hanamizu:20020" } from hanamizu:20020
[js_test:blackhole_first_config_server_from_mongos] 2015-11-04T14:39:22.808-0500 b20014| 2015-11-04T14:39:22.807-0500 I -        [thread40] Received "isMaster" command with arguments { isMaster: 1, hostInfo: "hanamizu:20020" } from hanamizu:20020
[js_test:blackhole_first_config_server_from_mongos] 2015-11-04T14:39:22.808-0500 b20014| 2015-11-04T14:39:22.807-0500 I -        [thread40] Discarding "isMaster" command with arguments { isMaster: 1, hostInfo: "hanamizu:20020" } from hanamizu:20020
[js_test:blackhole_first_config_server_from_mongos] 2015-11-04T14:39:28.008-0500 b20014| 2015-11-04T14:39:28.007-0500 I -        [thread41] Received "isMaster" command with arguments { isMaster: 1, hostInfo: "hanamizu:20020" } from hanamizu:20020
[js_test:blackhole_first_config_server_from_mongos] 2015-11-04T14:39:28.008-0500 b20014| 2015-11-04T14:39:28.007-0500 I -        [thread41] Discarding "isMaster" command with arguments { isMaster: 1, hostInfo: "hanamizu:20020" } from hanamizu:20020
[js_test:blackhole_first_config_server_from_mongos] 2015-11-04T14:39:43.207-0500 b20014| 2015-11-04T14:39:43.207-0500 I -        [thread42] Received "isMaster" command with arguments { isMaster: 1, hostInfo: "hanamizu:20020" } from hanamizu:20020
[js_test:blackhole_first_config_server_from_mongos] 2015-11-04T14:39:43.208-0500 b20014| 2015-11-04T14:39:43.207-0500 I -        [thread42] Discarding "isMaster" command with arguments { isMaster: 1, hostInfo: "hanamizu:20020" } from hanamizu:20020
[js_test:blackhole_first_config_server_from_mongos] 2015-11-04T14:39:58.008-0500 b20014| 2015-11-04T14:39:58.007-0500 I -        [thread43] Received "isMaster" command with arguments { isMaster: 1, hostInfo: "hanamizu:20020" } from hanamizu:20020
[js_test:blackhole_first_config_server_from_mongos] 2015-11-04T14:39:58.008-0500 b20014| 2015-11-04T14:39:58.007-0500 I -        [thread43] Discarding "isMaster" command with arguments { isMaster: 1, hostInfo: "hanamizu:20020" } from hanamizu:20020
[js_test:blackhole_first_config_server_from_mongos] 2015-11-04T14:40:13.006-0500 b20014| 2015-11-04T14:40:13.006-0500 I -        [thread44] Received "isMaster" command with arguments { isMaster: 1, hostInfo: "hanamizu:20020" } from hanamizu:20020
[js_test:blackhole_first_config_server_from_mongos] 2015-11-04T14:40:13.006-0500 b20014| 2015-11-04T14:40:13.006-0500 I -        [thread44] Discarding "isMaster" command with arguments { isMaster: 1, hostInfo: "hanamizu:20020" } from hanamizu:20020



 Comments   
Comment by Githook User [ 17/Nov/15 ]

Author:

{u'username': u'kaloianm', u'name': u'Kaloian Manassiev', u'email': u'kaloian.manassiev@mongodb.com'}

Message: SERVER-21293 Use more appropriate name for a sharding test
Branch: master
https://github.com/mongodb/mongo/commit/a7a896f961f49c7839b96e15e6e3b27c66c2611e

Comment by Githook User [ 17/Nov/15 ]

Author:

{u'username': u'kaloianm', u'name': u'Kaloian Manassiev', u'email': u'kaloian.manassiev@mongodb.com'}

Message: SERVER-21293 Add network timeout to the query fetcher

The query fecther currently does not pass timeout and because of this if
the config server is black-holed a command may never complete.
Branch: master
https://github.com/mongodb/mongo/commit/6c5d292463c0230104a4ca14716d8e82ebbcc2aa

Comment by Max Hirschhorn [ 04/Nov/15 ]

When a message is "discarded", the mongobridge process reads the message from the socket (sending a TCP acknowledgement), but does not reply with a message on the socket (nor does it forward the message onto the destination). This causes a socket timeout to be triggered on the process that sent the message.

Additionally, the test finishes when running against legacy config servers. The mongos will try and send 4 messages to the first config server, each of which trigger a socket timeout after 30 seconds:

[js_test:blackhole_first_config_server_from_mongos] 2015-11-04T15:16:23.605-0500 s20020| 2015-11-04T15:16:23.605-0500 I NETWORK  [conn2] query on config.databases: { _id: "test" } failed to: hanamizu:20015 (127.0.1.1) failed no data
[js_test:blackhole_first_config_server_from_mongos] 2015-11-04T15:16:53.705-0500 s20020| 2015-11-04T15:16:53.705-0500 I NETWORK  [conn2] query on config.databases: { _id: "test" } failed to: hanamizu:20015 (127.0.1.1) failed exception: socket exception [CONNECT_ERROR] for network error while attempting to run command 'isMaster' on host 'hanamizu:20015'
[js_test:blackhole_first_config_server_from_mongos] 2015-11-04T15:17:23.805-0500 s20020| 2015-11-04T15:17:23.805-0500 I NETWORK  [conn2] query on config.collections: { _id: /^test\./ } failed to: hanamizu:20015 (127.0.1.1) failed exception: socket exception [CONNECT_ERROR] for network error while attempting to run command 'isMaster' on host 'hanamizu:20015'
[js_test:blackhole_first_config_server_from_mongos] 2015-11-04T15:17:53.906-0500 s20020| 2015-11-04T15:17:53.905-0500 I NETWORK  [conn2] query on config.chunks: { query: { ns: "test.server16691", lastmod: { $gte: Timestamp 0|0 } }, orderby: { lastmod: 1 } } failed to: hanamizu:20015 (127.0.1.1) failed exception: socket exception [CONNECT_ERROR] for network error while attempting to run command 'isMaster' on host 'hanamizu:20015'
...
[js_test:blackhole_first_config_server_from_mongos] 2015-11-04T15:17:54.006-0500 s20020| 2015-11-04T15:17:54.006-0500 I SHARDING [conn2] ChunkManager: time to load chunks for test.server16691: 30100ms sequenceNumber: 4 version: 1|0||563a6779d48492779e07839c based on: (empty)
[js_test:blackhole_first_config_server_from_mongos] 2015-11-04T15:17:54.007-0500 b20011| 2015-11-04T15:17:54.007-0500 I -        [thread4] Received "find" command with arguments { find: "server16691", shardVersion: [ Timestamp 1000|0, ObjectId('563a6779d48492779e07839c') ] } from hanamizu:20020
[js_test:blackhole_first_config_server_from_mongos] 2015-11-04T15:17:54.008-0500 Read operation 1 of 10 took 120402 milliseconds
[js_test:blackhole_first_config_server_from_mongos] 2015-11-04T15:17:54.009-0500 b20021| 2015-11-04T15:17:54.008-0500 I -        [thread1] Received "find" command with arguments { find: "server16691", filter: {} } from <unknown>
[js_test:blackhole_first_config_server_from_mongos] 2015-11-04T15:17:54.009-0500 b20011| 2015-11-04T15:17:54.009-0500 I -        [thread4] Received "find" command with arguments { find: "server16691", shardVersion: [ Timestamp 1000|0, ObjectId('563a6779d48492779e07839c') ] } from hanamizu:20020
[js_test:blackhole_first_config_server_from_mongos] 2015-11-04T15:17:54.010-0500 Read operation 2 of 10 took 2 milliseconds
[js_test:blackhole_first_config_server_from_mongos] 2015-11-04T15:17:54.010-0500 b20021| 2015-11-04T15:17:54.010-0500 I -        [thread1] Received "find" command with arguments { find: "server16691", filter: {} } from <unknown>
[js_test:blackhole_first_config_server_from_mongos] 2015-11-04T15:17:54.011-0500 b20011| 2015-11-04T15:17:54.010-0500 I -        [thread4] Received "find" command with arguments { find: "server16691", shardVersion: [ Timestamp 1000|0, ObjectId('563a6779d48492779e07839c') ] } from hanamizu:20020
[js_test:blackhole_first_config_server_from_mongos] 2015-11-04T15:17:54.011-0500 Read operation 3 of 10 took 1 milliseconds
[js_test:blackhole_first_config_server_from_mongos] 2015-11-04T15:17:54.012-0500 b20021| 2015-11-04T15:17:54.012-0500 I -        [thread1] Received "find" command with arguments { find: "server16691", filter: {} } from <unknown>
[js_test:blackhole_first_config_server_from_mongos] 2015-11-04T15:17:54.012-0500 b20011| 2015-11-04T15:17:54.012-0500 I -        [thread4] Received "find" command with arguments { find: "server16691", shardVersion: [ Timestamp 1000|0, ObjectId('563a6779d48492779e07839c') ] } from hanamizu:20020
[js_test:blackhole_first_config_server_from_mongos] 2015-11-04T15:17:54.013-0500 Read operation 4 of 10 took 2 milliseconds
[js_test:blackhole_first_config_server_from_mongos] 2015-11-04T15:17:54.013-0500 b20021| 2015-11-04T15:17:54.013-0500 I -        [thread1] Received "find" command with arguments { find: "server16691", filter: {} } from <unknown>
[js_test:blackhole_first_config_server_from_mongos] 2015-11-04T15:17:54.014-0500 b20011| 2015-11-04T15:17:54.014-0500 I -        [thread4] Received "find" command with arguments { find: "server16691", shardVersion: [ Timestamp 1000|0, ObjectId('563a6779d48492779e07839c') ] } from hanamizu:20020
[js_test:blackhole_first_config_server_from_mongos] 2015-11-04T15:17:54.015-0500 Read operation 5 of 10 took 1 milliseconds
[js_test:blackhole_first_config_server_from_mongos] 2015-11-04T15:17:54.015-0500 b20021| 2015-11-04T15:17:54.015-0500 I -        [thread1] Received "find" command with arguments { find: "server16691", filter: {} } from <unknown>
[js_test:blackhole_first_config_server_from_mongos] 2015-11-04T15:17:54.015-0500 b20011| 2015-11-04T15:17:54.015-0500 I -        [thread4] Received "find" command with arguments { find: "server16691", shardVersion: [ Timestamp 1000|0, ObjectId('563a6779d48492779e07839c') ] } from hanamizu:20020
[js_test:blackhole_first_config_server_from_mongos] 2015-11-04T15:17:54.016-0500 Read operation 6 of 10 took 2 milliseconds
[js_test:blackhole_first_config_server_from_mongos] 2015-11-04T15:17:54.016-0500 b20021| 2015-11-04T15:17:54.016-0500 I -        [thread1] Received "find" command with arguments { find: "server16691", filter: {} } from <unknown>
[js_test:blackhole_first_config_server_from_mongos] 2015-11-04T15:17:54.017-0500 b20011| 2015-11-04T15:17:54.016-0500 I -        [thread4] Received "find" command with arguments { find: "server16691", shardVersion: [ Timestamp 1000|0, ObjectId('563a6779d48492779e07839c') ] } from hanamizu:20020
[js_test:blackhole_first_config_server_from_mongos] 2015-11-04T15:17:54.017-0500 Read operation 7 of 10 took 1 milliseconds
[js_test:blackhole_first_config_server_from_mongos] 2015-11-04T15:17:54.018-0500 b20021| 2015-11-04T15:17:54.018-0500 I -        [thread1] Received "find" command with arguments { find: "server16691", filter: {} } from <unknown>
[js_test:blackhole_first_config_server_from_mongos] 2015-11-04T15:17:54.018-0500 b20011| 2015-11-04T15:17:54.018-0500 I -        [thread4] Received "find" command with arguments { find: "server16691", shardVersion: [ Timestamp 1000|0, ObjectId('563a6779d48492779e07839c') ] } from hanamizu:20020
[js_test:blackhole_first_config_server_from_mongos] 2015-11-04T15:17:54.019-0500 Read operation 8 of 10 took 2 milliseconds
[js_test:blackhole_first_config_server_from_mongos] 2015-11-04T15:17:54.019-0500 b20021| 2015-11-04T15:17:54.019-0500 I -        [thread1] Received "find" command with arguments { find: "server16691", filter: {} } from <unknown>
[js_test:blackhole_first_config_server_from_mongos] 2015-11-04T15:17:54.020-0500 b20011| 2015-11-04T15:17:54.019-0500 I -        [thread4] Received "find" command with arguments { find: "server16691", shardVersion: [ Timestamp 1000|0, ObjectId('563a6779d48492779e07839c') ] } from hanamizu:20020
[js_test:blackhole_first_config_server_from_mongos] 2015-11-04T15:17:54.020-0500 Read operation 9 of 10 took 1 milliseconds
[js_test:blackhole_first_config_server_from_mongos] 2015-11-04T15:17:54.021-0500 b20021| 2015-11-04T15:17:54.021-0500 I -        [thread1] Received "find" command with arguments { find: "server16691", filter: {} } from <unknown>
[js_test:blackhole_first_config_server_from_mongos] 2015-11-04T15:17:54.021-0500 b20011| 2015-11-04T15:17:54.021-0500 I -        [thread4] Received "find" command with arguments { find: "server16691", shardVersion: [ Timestamp 1000|0, ObjectId('563a6779d48492779e07839c') ] } from hanamizu:20020
[js_test:blackhole_first_config_server_from_mongos] 2015-11-04T15:17:54.022-0500 Read operation 10 of 10 took 2 milliseconds

Running the test against legacy config servers:

python buildscripts/resmoke.py --executor sharding_legacy blackhole_first_config_server_from_mongos.js

Generated at Thu Feb 08 03:56:54 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.