Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-18671

SecondaryPreferred can end up using unversioned connections

    Details

    • Backwards Compatibility:
      Fully Compatible
    • Backport Completed:
    • Steps To Reproduce:
      Hide

      Run test.js after applying repro.diff. You should be to see the log near the end of the test.

      Show
      Run test.js after applying repro.diff. You should be to see the log near the end of the test.
    • Sprint:
      Sharding E (01/08/16), Sharding F (01/29/16), Sharding 11 (03/11/16)

      Description

      When mongos tries to setup the version for the connection to be used for queries, it checks if the primary is down with this:

      https://github.com/mongodb/mongo/blob/r3.1.5/src/mongo/client/parallel.cpp#L574

      bool connIsDown = rawConn->isFailed();
      

      However, if you look at the implementation of isFailed:

      return !_master || _master->isFailed();
      

      It can return false if the _master is not initialized (when the replica set connection has not yet talked to the master). The reason this was fine in v2.6 is mongos used to eagerly call setShardVersion on every connection created and by the above codepath is reached, _master is guaranteed to be set unless an error occurred. This is no longer true in v3.0 as SERVER-15375 removed the eager initialization.

      Original description from user:

      We are following the procedure of upgrading sharded cluster of MongoDB from http://docs.mongodb.org/manual/release-notes/3.0-upgrade/#upgrade-a-sharded-cluster-to-3-0.

      After upgrading one of our main mongoses from 2.6.9 to 3.0.3 we started seeing many following messages:

      2015-05-27T11:27:46.436+0200 W NETWORK  [conn358] Primary for set3/mongo3:27018,mongo8:27018 was down before, bypassing setShardVersion. The local replica set view and targeting may be stale.
      2015-05-27T11:27:46.478+0200 W NETWORK  [conn312] Primary for set5/mongo10:27018,mongo5:27018 was down before, bypassing setShardVersion. The local replica set view and targeting may be stale.
      2015-05-27T11:27:46.500+0200 W NETWORK  [conn206] Primary for set2/mongo2:27018,mongo7:27018 was down before, bypassing setShardVersion. The local replica set view and targeting may be stale.
      2015-05-27T11:27:46.623+0200 W NETWORK  [conn355] Primary for set5/mongo10:27018,mongo5:27018 was down before, bypassing setShardVersion. The local replica set view and targeting may be stale.
      2015-05-27T11:27:46.688+0200 W NETWORK  [conn98] Primary for set4/mongo4:27018,mongo9:27018 was down before, bypassing setShardVersion. The local replica set view and targeting may be stale.
      2015-05-27T11:27:46.738+0200 W NETWORK  [conn469] Primary for set4/mongo4:27018,mongo9:27018 was down before, bypassing setShardVersion. The local replica set view and targeting may be stale.
      2015-05-27T11:27:46.816+0200 W NETWORK  [conn180] Primary for set4/mongo4:27018,mongo9:27018 was down before, bypassing setShardVersion. The local replica set view and targeting may be stale.
      2015-05-27T11:27:46.846+0200 W NETWORK  [conn288] Primary for set5/mongo10:27018,mongo5:27018 was down before, bypassing setShardVersion. The local replica set view and targeting may be stale.
      2015-05-27T11:27:46.909+0200 W NETWORK  [conn253] Primary for set5/mongo10:27018,mongo5:27018 was down before, bypassing setShardVersion. The local replica set view and targeting may be stale.
      2015-05-27T11:27:46.950+0200 W NETWORK  [conn103] Primary for set5/mongo10:27018,mongo5:27018 was down before, bypassing setShardVersion. The local replica set view and targeting may be stale.
      2015-05-27T11:27:47.016+0200 W NETWORK  [conn56] Primary for set5/mongo10:27018,mongo5:27018 was down before, bypassing setShardVersion. The local replica set view and targeting may be stale.
      2015-05-27T11:27:47.061+0200 W NETWORK  [conn36] Primary for set5/mongo10:27018,mongo5:27018 was down before, bypassing setShardVersion. The local replica set view and targeting may be stale.
      2015-05-27T11:27:47.105+0200 W NETWORK  [conn151] Primary for set3/mongo3:27018,mongo8:27018 was down before, bypassing setShardVersion. The local replica set view and targeting may be stale.
      2015-05-27T11:27:47.197+0200 W NETWORK  [conn138] Primary for set5/mongo10:27018,mongo5:27018 was down before, bypassing setShardVersion. The local replica set view and targeting may be stale.
      2015-05-27T11:27:47.337+0200 W NETWORK  [conn360] Primary for set5/mongo10:27018,mongo5:27018 was down before, bypassing setShardVersion. The local replica set view and targeting may be stale.
      

      Right now we rollbacked again to 2.6.9. Should we continue upgrading the whole cluster and after that those messages will be gone?

      1. mongos_startup_log
        933 kB
        Marcin Lipiec
      2. repro.diff
        0.8 kB
        Randolph Tan
      3. test.js
        0.5 kB
        Randolph Tan

        Issue Links

          Activity

          Hide
          xgen-internal-githook Githook User added a comment -

          Author:

          {u'username': u'renctan', u'name': u'Randolph Tan', u'email': u'randolph@10gen.com'}

          Message: SERVER-18671 SecondaryPreferred can end up using unversioned connections
          (cherry picked from commit 1d611a8c7ee346929a4186f524c21007ef7a279d)
          Branch: v3.2
          https://github.com/mongodb/mongo/commit/e9e372e1e2e48c1420c11af63f13b5ec227b4e8c

          Show
          xgen-internal-githook Githook User added a comment - Author: {u'username': u'renctan', u'name': u'Randolph Tan', u'email': u'randolph@10gen.com'} Message: SERVER-18671 SecondaryPreferred can end up using unversioned connections (cherry picked from commit 1d611a8c7ee346929a4186f524c21007ef7a279d) Branch: v3.2 https://github.com/mongodb/mongo/commit/e9e372e1e2e48c1420c11af63f13b5ec227b4e8c
          Hide
          pperekalov Pavel Perekalov added a comment -

          Can you tell, please, when this issue will be backported to 3.0

          Show
          pperekalov Pavel Perekalov added a comment - Can you tell, please, when this issue will be backported to 3.0
          Hide
          ramon.fernandez Ramon Fernandez added a comment -

          Pavel Perekalov, we're assessing whether a backport to 3.0 is doable safely. Any updates will be posted on this ticket.

          Show
          ramon.fernandez Ramon Fernandez added a comment - Pavel Perekalov , we're assessing whether a backport to 3.0 is doable safely. Any updates will be posted on this ticket.
          Hide
          quentins Quentin Schroeder added a comment -

          We are also running into this issue and would love to see the fix backported to 3.0. I'll keep an eye on this ticket to see what decisions are made.

          Show
          quentins Quentin Schroeder added a comment - We are also running into this issue and would love to see the fix backported to 3.0. I'll keep an eye on this ticket to see what decisions are made.
          Hide
          xgen-internal-githook Githook User added a comment -

          Author:

          {u'username': u'renctan', u'name': u'Randolph Tan', u'email': u'randolph@10gen.com'}

          Message: SERVER-18671 SecondaryPreferred can end up using unversioned connections

          (cherry picked from commit 1d611a8c7ee346929a4186f524c21007ef7a279d)
          Branch: v3.0
          https://github.com/mongodb/mongo/commit/5c2737de7776e37d2fbf5259f4ccd3c2f5cf24fa

          Show
          xgen-internal-githook Githook User added a comment - Author: {u'username': u'renctan', u'name': u'Randolph Tan', u'email': u'randolph@10gen.com'} Message: SERVER-18671 SecondaryPreferred can end up using unversioned connections (cherry picked from commit 1d611a8c7ee346929a4186f524c21007ef7a279d) Branch: v3.0 https://github.com/mongodb/mongo/commit/5c2737de7776e37d2fbf5259f4ccd3c2f5cf24fa

            People

            • Votes:
              4 Vote for this issue
              Watchers:
              15 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                  Agile