Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-18671

SecondaryPreferred can end up using unversioned connections

    Details

    • Backwards Compatibility:
      Fully Compatible
    • Backport Completed:
    • Steps To Reproduce:
      Hide

      Run test.js after applying repro.diff. You should be to see the log near the end of the test.

      Show
      Run test.js after applying repro.diff. You should be to see the log near the end of the test.
    • Sprint:
      Sharding E (01/08/16), Sharding F (01/29/16), Sharding 11 (03/11/16)

      Description

      When mongos tries to setup the version for the connection to be used for queries, it checks if the primary is down with this:

      https://github.com/mongodb/mongo/blob/r3.1.5/src/mongo/client/parallel.cpp#L574

      bool connIsDown = rawConn->isFailed();
      

      However, if you look at the implementation of isFailed:

      return !_master || _master->isFailed();
      

      It can return false if the _master is not initialized (when the replica set connection has not yet talked to the master). The reason this was fine in v2.6 is mongos used to eagerly call setShardVersion on every connection created and by the above codepath is reached, _master is guaranteed to be set unless an error occurred. This is no longer true in v3.0 as SERVER-15375 removed the eager initialization.

      Original description from user:

      We are following the procedure of upgrading sharded cluster of MongoDB from http://docs.mongodb.org/manual/release-notes/3.0-upgrade/#upgrade-a-sharded-cluster-to-3-0.

      After upgrading one of our main mongoses from 2.6.9 to 3.0.3 we started seeing many following messages:

      2015-05-27T11:27:46.436+0200 W NETWORK  [conn358] Primary for set3/mongo3:27018,mongo8:27018 was down before, bypassing setShardVersion. The local replica set view and targeting may be stale.
      2015-05-27T11:27:46.478+0200 W NETWORK  [conn312] Primary for set5/mongo10:27018,mongo5:27018 was down before, bypassing setShardVersion. The local replica set view and targeting may be stale.
      2015-05-27T11:27:46.500+0200 W NETWORK  [conn206] Primary for set2/mongo2:27018,mongo7:27018 was down before, bypassing setShardVersion. The local replica set view and targeting may be stale.
      2015-05-27T11:27:46.623+0200 W NETWORK  [conn355] Primary for set5/mongo10:27018,mongo5:27018 was down before, bypassing setShardVersion. The local replica set view and targeting may be stale.
      2015-05-27T11:27:46.688+0200 W NETWORK  [conn98] Primary for set4/mongo4:27018,mongo9:27018 was down before, bypassing setShardVersion. The local replica set view and targeting may be stale.
      2015-05-27T11:27:46.738+0200 W NETWORK  [conn469] Primary for set4/mongo4:27018,mongo9:27018 was down before, bypassing setShardVersion. The local replica set view and targeting may be stale.
      2015-05-27T11:27:46.816+0200 W NETWORK  [conn180] Primary for set4/mongo4:27018,mongo9:27018 was down before, bypassing setShardVersion. The local replica set view and targeting may be stale.
      2015-05-27T11:27:46.846+0200 W NETWORK  [conn288] Primary for set5/mongo10:27018,mongo5:27018 was down before, bypassing setShardVersion. The local replica set view and targeting may be stale.
      2015-05-27T11:27:46.909+0200 W NETWORK  [conn253] Primary for set5/mongo10:27018,mongo5:27018 was down before, bypassing setShardVersion. The local replica set view and targeting may be stale.
      2015-05-27T11:27:46.950+0200 W NETWORK  [conn103] Primary for set5/mongo10:27018,mongo5:27018 was down before, bypassing setShardVersion. The local replica set view and targeting may be stale.
      2015-05-27T11:27:47.016+0200 W NETWORK  [conn56] Primary for set5/mongo10:27018,mongo5:27018 was down before, bypassing setShardVersion. The local replica set view and targeting may be stale.
      2015-05-27T11:27:47.061+0200 W NETWORK  [conn36] Primary for set5/mongo10:27018,mongo5:27018 was down before, bypassing setShardVersion. The local replica set view and targeting may be stale.
      2015-05-27T11:27:47.105+0200 W NETWORK  [conn151] Primary for set3/mongo3:27018,mongo8:27018 was down before, bypassing setShardVersion. The local replica set view and targeting may be stale.
      2015-05-27T11:27:47.197+0200 W NETWORK  [conn138] Primary for set5/mongo10:27018,mongo5:27018 was down before, bypassing setShardVersion. The local replica set view and targeting may be stale.
      2015-05-27T11:27:47.337+0200 W NETWORK  [conn360] Primary for set5/mongo10:27018,mongo5:27018 was down before, bypassing setShardVersion. The local replica set view and targeting may be stale.
      

      Right now we rollbacked again to 2.6.9. Should we continue upgrading the whole cluster and after that those messages will be gone?

        Attachments

        1. mongos_startup_log
          933 kB
        2. repro.diff
          0.8 kB
        3. test.js
          0.5 kB

          Issue Links

            Activity

              People

              • Votes:
                4 Vote for this issue
                Watchers:
                15 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: