Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-21400

Insert takes 90 seconds when first config server (SCCC) is in TCP blackhole from mongos triggering autosplit

    • Type: Icon: Bug Bug
    • Resolution: Won't Fix
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: 3.2.0-rc2
    • Component/s: Sharding
    • None
    • ALL
    • Hide
      python buildscripts/resmoke.py --executor sharding_legacy autosplit_during_config_server_blackhole.js
      
      Show
      python buildscripts/resmoke.py --executor sharding_legacy autosplit_during_config_server_blackhole.js
    • Sharding C (11/20/15), Sharding D (12/11/15)

      Tests the scenario when the first config server discards all messages from the mongos, but not the shards. Note that this doesn't prevent the autosplit from succeeding. When the following sequence of events happen, mongos will try and send 3 messages to the first config server, each of which trigger a socket timeout after 30 seconds:

      1. Initialize the sharded cluster
      2. Enable sharding on the "test" database
      3. Create a sharded collection called "server16690" with shard key {_id: 1}
      4. Configure the mongobridge corresponding to the first config server to discard messages from the mongos
      5. Configure the mongobridges corresponding to the other config servers to delay messages from the mongos
      6. Insert a few documents into the test.server16690 collection to trigger an autosplit
      [js_test:autosplit_during_config_server_blackhole] 2015-11-10T14:48:22.743-0500 Starting write operations on the sharded collection
      [js_test:autosplit_during_config_server_blackhole] 2015-11-10T14:48:22.746-0500 Write operation 1 of 10 took 2 milliseconds
      [js_test:autosplit_during_config_server_blackhole] 2015-11-10T14:48:22.748-0500 Write operation 2 of 10 took 2 milliseconds
      [js_test:autosplit_during_config_server_blackhole] 2015-11-10T14:48:22.750-0500 Write operation 3 of 10 took 2 milliseconds
      [js_test:autosplit_during_config_server_blackhole] 2015-11-10T14:48:22.753-0500 Write operation 4 of 10 took 2 milliseconds
      [js_test:autosplit_during_config_server_blackhole] 2015-11-10T14:49:53.050-0500 Write operation 5 of 10 took 90336 milliseconds
      [js_test:autosplit_during_config_server_blackhole] 2015-11-10T14:49:53.051-0500 Write operation 6 of 10 took 1 milliseconds
      [js_test:autosplit_during_config_server_blackhole] 2015-11-10T14:49:53.052-0500 Write operation 7 of 10 took 1 milliseconds
      [js_test:autosplit_during_config_server_blackhole] 2015-11-10T14:49:53.053-0500 Write operation 8 of 10 took 0 milliseconds
      [js_test:autosplit_during_config_server_blackhole] 2015-11-10T14:49:53.053-0500 Write operation 9 of 10 took 1 milliseconds
      [js_test:autosplit_during_config_server_blackhole] 2015-11-10T14:49:53.054-0500 Write operation 10 of 10 took 1 milliseconds
      
      [js_test:autosplit_during_config_server_blackhole] 2015-11-10T14:48:22.821-0500 b20015| 2015-11-10T14:48:22.701-0500 I BRIDGE   [thread2] Received "discardMessagesFrom" command with arguments { discardMessagesFrom: 1.0, $forBridge: true, host: "hanamizu:20020", loss: 1.0 } from <unknown>
      [js_test:autosplit_during_config_server_blackhole] 2015-11-10T14:48:22.835-0500 s20020| 2015-11-10T14:48:22.715-0500 D SHARDING [conn2] about to initiate autosplit: ns: test.server16690, shard: shard0000, lastmod: 1|0||56424a065de2814f99135918, min: { _id: MinKey }, max: { _id: MaxKey } dataWritten: 229 splitThreshold: 921
      [js_test:autosplit_during_config_server_blackhole] 2015-11-10T14:48:22.766-0500 b20011| 2015-11-10T14:48:22.716-0500 I BRIDGE   [thread6] Received "splitChunk" command with arguments { splitChunk: "test.server16690", keyPattern: { _id: 1.0 }, min: { _id: MinKey }, max: { _id: MaxKey }, from: "shard0000", splitKeys: [ { _id: ObjectId('56424a0680361f45a8fe3e74') }, { _id: ObjectId('56424a0680361f45a8fe3e78') } ], configdb: "hanamizu:20015,hanamizu:20017,hanamizu:20019", shardVersion: [ Timestamp 1000|0, ObjectId('56424a065de2814f99135918') ], epoch: ObjectId('56424a065de2814f99135918') } from hanamizu:20020
      [js_test:autosplit_during_config_server_blackhole] 2015-11-10T14:48:22.836-0500 b20015| 2015-11-10T14:48:22.749-0500 I BRIDGE   [thread5] Received "ns" command with arguments { ns: "test.server16690" } from hanamizu:20020
      [js_test:autosplit_during_config_server_blackhole] 2015-11-10T14:48:22.836-0500 b20015| 2015-11-10T14:48:22.749-0500 I BRIDGE   [thread5] Discarding "ns" command with arguments { ns: "test.server16690" } from hanamizu:20020
      ...
      [js_test:autosplit_during_config_server_blackhole] 2015-11-10T14:48:52.749-0500 s20020| 2015-11-10T14:48:52.748-0500 I NETWORK  [conn2] Socket recv() timeout  127.0.1.1:20015
      [js_test:autosplit_during_config_server_blackhole] 2015-11-10T14:48:52.749-0500 s20020| 2015-11-10T14:48:52.749-0500 I NETWORK  [conn2] SocketException: remote: 127.0.1.1:20015 error: 9001 socket exception [RECV_TIMEOUT] server [127.0.1.1:20015]
      [js_test:autosplit_during_config_server_blackhole] 2015-11-10T14:48:52.749-0500 s20020| 2015-11-10T14:48:52.749-0500 I NETWORK  [conn2] DBClientCursor::init call() failed
      [js_test:autosplit_during_config_server_blackhole] 2015-11-10T14:48:52.750-0500 s20020| 2015-11-10T14:48:52.749-0500 I NETWORK  [conn2] query on config.chunks: { query: { ns: "test.server16690" }, orderby: { lastmod: -1 } } failed to: hanamizu:20015 (127.0.1.1) failed no data
      [js_test:autosplit_during_config_server_blackhole] 2015-11-10T14:48:52.851-0500 s20020| 2015-11-10T14:48:52.851-0500 I NETWORK  [conn2] trying reconnect to hanamizu:20015 (127.0.1.1) failed
      [js_test:autosplit_during_config_server_blackhole] 2015-11-10T14:48:52.852-0500 s20020| 2015-11-10T14:48:52.851-0500 D NETWORK  [conn2] connected to server hanamizu:20015 (127.0.1.1)
      [js_test:autosplit_during_config_server_blackhole] 2015-11-10T14:48:52.853-0500 b20015| 2015-11-10T14:48:52.852-0500 I BRIDGE   [thread19] Received "isMaster" command with arguments { isMaster: 1, hostInfo: "hanamizu:20020" } from hanamizu:20020
      [js_test:autosplit_during_config_server_blackhole] 2015-11-10T14:48:52.853-0500 b20015| 2015-11-10T14:48:52.852-0500 I BRIDGE   [thread19] Discarding "isMaster" command with arguments { isMaster: 1, hostInfo: "hanamizu:20020" } from hanamizu:20020
      ...
      [js_test:autosplit_during_config_server_blackhole] 2015-11-10T14:49:22.849-0500 s20020| 2015-11-10T14:49:22.848-0500 I NETWORK  [conn2] Socket recv() timeout  127.0.1.1:20015
      [js_test:autosplit_during_config_server_blackhole] 2015-11-10T14:49:22.849-0500 s20020| 2015-11-10T14:49:22.848-0500 I NETWORK  [conn2] SocketException: remote: 127.0.1.1:20015 error: 9001 socket exception [RECV_TIMEOUT] server [127.0.1.1:20015]
      [js_test:autosplit_during_config_server_blackhole] 2015-11-10T14:49:22.849-0500 s20020| 2015-11-10T14:49:22.849-0500 D -        [conn2] User Assertion: 6:network error while attempting to run command 'isMaster' on host 'hanamizu:20015'
      [js_test:autosplit_during_config_server_blackhole] 2015-11-10T14:49:22.850-0500 s20020| 2015-11-10T14:49:22.849-0500 I NETWORK  [conn2] reconnect hanamizu:20015 (127.0.1.1) failed failed
      [js_test:autosplit_during_config_server_blackhole] 2015-11-10T14:49:22.850-0500 s20020| 2015-11-10T14:49:22.849-0500 I NETWORK  [conn2] query on config.chunks: { query: { ns: "test.server16690", lastmod: { $gte: Timestamp 1000|0 } }, orderby: { lastmod: 1 } } failed to: hanamizu:20015 (127.0.1.1) failed exception: socket exception [CONNECT_ERROR] for network error while attempting to run command 'isMaster' on host 'hanamizu:20015'
      [js_test:autosplit_during_config_server_blackhole] 2015-11-10T14:49:22.951-0500 s20020| 2015-11-10T14:49:22.950-0500 I SHARDING [conn2] ChunkManager: time to load chunks for test.server16690: 30100ms sequenceNumber: 4 version: 1|3||56424a065de2814f99135918 based on: 1|0||56424a065de2814f99135918
      [js_test:autosplit_during_config_server_blackhole] 2015-11-10T14:49:22.952-0500 s20020| 2015-11-10T14:49:22.951-0500 I NETWORK  [conn2] trying reconnect to hanamizu:20015 (127.0.1.1) failed
      [js_test:autosplit_during_config_server_blackhole] 2015-11-10T14:49:22.953-0500 s20020| 2015-11-10T14:49:22.952-0500 D NETWORK  [conn2] connected to server hanamizu:20015 (127.0.1.1)
      [js_test:autosplit_during_config_server_blackhole] 2015-11-10T14:49:22.953-0500 b20015| 2015-11-10T14:49:22.952-0500 I BRIDGE   [thread21] Received "isMaster" command with arguments { isMaster: 1, hostInfo: "hanamizu:20020" } from hanamizu:20020
      [js_test:autosplit_during_config_server_blackhole] 2015-11-10T14:49:22.954-0500 b20015| 2015-11-10T14:49:22.952-0500 I BRIDGE   [thread21] Discarding "isMaster" command with arguments { isMaster: 1, hostInfo: "hanamizu:20020" } from hanamizu:20020
      ...
      [js_test:autosplit_during_config_server_blackhole] 2015-11-10T14:49:52.949-0500 s20020| 2015-11-10T14:49:52.948-0500 I NETWORK  [conn2] Socket recv() timeout  127.0.1.1:20015
      [js_test:autosplit_during_config_server_blackhole] 2015-11-10T14:49:52.949-0500 s20020| 2015-11-10T14:49:52.948-0500 I NETWORK  [conn2] SocketException: remote: 127.0.1.1:20015 error: 9001 socket exception [RECV_TIMEOUT] server [127.0.1.1:20015]
      [js_test:autosplit_during_config_server_blackhole] 2015-11-10T14:49:52.949-0500 s20020| 2015-11-10T14:49:52.948-0500 D -        [conn2] User Assertion: 6:network error while attempting to run command 'isMaster' on host 'hanamizu:20015'
      [js_test:autosplit_during_config_server_blackhole] 2015-11-10T14:49:52.949-0500 s20020| 2015-11-10T14:49:52.948-0500 I NETWORK  [conn2] reconnect hanamizu:20015 (127.0.1.1) failed failed
      [js_test:autosplit_during_config_server_blackhole] 2015-11-10T14:49:52.949-0500 s20020| 2015-11-10T14:49:52.948-0500 I NETWORK  [conn2] query on config.settings: { _id: "balancer" } failed to: hanamizu:20015 (127.0.1.1) failed exception: socket exception [CONNECT_ERROR] for network error while attempting to run command 'isMaster' on host 'hanamizu:20015'
      [js_test:autosplit_during_config_server_blackhole] 2015-11-10T14:49:53.050-0500 s20020| 2015-11-10T14:49:53.049-0500 I SHARDING [conn2] autosplitted test.server16690 shard: ns: test.server16690, shard: shard0000, lastmod: 1|0||56424a065de2814f99135918, min: { _id: MinKey }, max: { _id: MaxKey } into 3 (splitThreshold 921) (migrate suggested, but no migrations allowed)
      

            Assignee:
            Unassigned Unassigned
            Reporter:
            max.hirschhorn@mongodb.com Max Hirschhorn
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved: