Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-28019

shard version not ok: version epoch mismatch detected

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Critical - P2
    • Resolution: Duplicate
    • Affects Version/s: 3.2.10, 3.4.2
    • Fix Version/s: None
    • Component/s: Sharding
    • Labels:
      None
    • Operating System:
      ALL

      Description

      System information

      PHP 7.0.15-1+deb.sury.org~trusty+1 (cli) (built: Jan 20 2017 09:16:11) ( NTS )
      Copyright (c) 1997-2017 The PHP Group
      Zend Engine v3.0.0, Copyright (c) 1998-2017 Zend Technologies
          with Zend OPcache v7.0.15-1+deb.sury.org~trusty+1, Copyright (c) 1999-2017, by Zend Technologies
       
      pecl mongodb   1.2.5
      mongodb 3.4.2
      Ubuntu 14.04.5
      Linux 3.13.0-100-generic #147-Ubuntu SMP Tue Oct 18 16:48:51 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
      

      We run a cluster with 27 shards and each application uses a local mongos to connect to it. We have the balancer stopped.

      We often see Exception for querys to collections which are sharded. Everytime there is a change to shard version of the collection, our PHP clients report an error on a simple find operation. Sofar we have never seen an exception for our non PHP clients.
      It seems that this issue was introduced with mongodb version 3.2.x, but we can also reproduce it on 3.4.2 in our dev environment.

      To reproduce it in our dev environemnt I started to shard a collection, but I could also move a chunk or split a chunk. I choose this collection because we see problems for querys to this collection in production which runs version 3.2.10 and we hadn't sharded it in our dev environment so far (very small in dev).
      There are no errors in router.log

      Before we sharded the collection our PHP application showed no errors.
      After sharding our PHP client reports:

      [data.items] shard version not ok: version epoch mismatch detected for data.items, the collection may have been dropped and recreated ( ns : data.items, received : 0|0||000000000000000000000000, wanted : 1|0||58a46d88be4b4925a7638e2f, send ) /path/to/code/vendor/php/mongodb/Find.php (206) MongoDB\Driver\Server::executeQuery
      

      Stacktrace:

      [23]
      File:      /path/to/code/src/model/dfp/connectors/mdb/MdbLineItemConnector.php in line 68
      Method:    rg\ODM\MongoDB\Cursor->toArray
       
      [24]
      File:      /path/to/code/vendor/composer/rg/mongodb-odm-light/lib/rg/ODM/MongoDB/Cursor.php in line 277
      Method:    Doctrine\MongoDB\Cursor->toArray
      Parameter: [true]
       
      [25]
      File:      /path/to/code/vendor/composer/rg/mongodb-odm-light/lib/Doctrine/MongoDB/Cursor.php in line 629
      Method:    Doctrine\MongoDB\Cursor->retry
      Parameter: [Closure, true]
       
      [26]
      File:      /path/to/code/vendor/composer/rg/mongodb-odm-light/lib/Doctrine/MongoDB/Cursor.php in line 666
      Method:    Doctrine\MongoDB\Cursor->Doctrine\MongoDB\{closure}
       
      [27]
      File:      /path/to/code/vendor/composer/rg/mongodb-odm-light/lib/Doctrine/MongoDB/Cursor.php in line 628
      Parameter: [rg\core\base\objectManagers\mongodb\LoggingCursor]
       
      [28]
      Method:    rg\core\base\objectManagers\mongodb\LoggingCursor->rewind
       
      [29]
      File:      /path/to/code/src/core/base/objectManagers/mongodb/LoggingCursor.php in line 253
      Method:    Doctrine\MongoDB\Cursor->rewind
       
      [30]
      File:      /path/to/code/vendor/composer/rg/mongodb-odm-light/lib/Doctrine/MongoDB/Cursor.php in line 490
      Method:    Doctrine\MongoDB\Cursor->retry
      Parameter: [Closure, false]
       
      [31]
      File:      /path/to/code/vendor/composer/rg/mongodb-odm-light/lib/Doctrine/MongoDB/Cursor.php in line 666
      Method:    Doctrine\MongoDB\Cursor->Doctrine\MongoDB\{closure}
       
      [32]
      File:      /path/to/code/vendor/composer/rg/mongodb-odm-light/lib/Doctrine/MongoDB/Cursor.php in line 489
      Method:    Alcaeus\MongoDbAdapter\AbstractCursor->rewind
       
      [33]
      File:      /path/to/code/vendor/composer/alcaeus/mongo-php-adapter/lib/Alcaeus/MongoDbAdapter/AbstractCursor.php in line 190
      Method:    Alcaeus\MongoDbAdapter\AbstractCursor->ensureIterator
       
      [34]
      File:      /path/to/code/vendor/composer/alcaeus/mongo-php-adapter/lib/Alcaeus/MongoDbAdapter/AbstractCursor.php in line 297
      Method:    MongoCursor->ensureCursor
       
      [35]
      File:      /path/to/code/vendor/composer/alcaeus/mongo-php-adapter/lib/Mongo/MongoCursor.php in line 443
      Method:    MongoCursor->doQuery
       
      [36]
      File:      /path/to/code/vendor/composer/alcaeus/mongo-php-adapter/lib/Mongo/MongoCursor.php in line 166
      Method:    MongoDB\Collection->find
      Parameter: [MongoDB\Model\BSONDocument: ['o' => MongoDB\Model\BSONDocument: ['$in' => ['0' => 123456, '1' => 275383959000, '2' => 114204279, '3' => 233433999, '4' => 233014719, '5' => 237809079, '6' => 247189479, '7' => 251772159, '8' => 260437839, '9' => 275383959, '10' => 285140199, [10more ...]]], '$comment' => 'cId f8688c0e3b4b6de7b86d0c38569fb290,vId 165...'], ['batchSize' => 0, 'modifiers' => [], 'projection' => [], 'readPreference' => MongoDB\Driver\ReadPreference, 'readConcern' => MongoDB\Driver\ReadConcern, 'typeMap' => ['array' => 'MongoDB\Model\BSONArray', 'document' => 'MongoDB\Model\BSONDocument', 'root' => 'MongoDB\Model\BSONDocument']]]
       
      [37]
      File:      /path/to/code/vendor/composer/mongodb/mongodb/src/Collection.php in line 525
      Method:    MongoDB\Operation\Find->execute
      Parameter: [MongoDB\Driver\Server: [MongoDB\Driver\Server Object
      (
          [host] => localhost
          [port] => 27017
          [type] => 2
          [is_primary] => 
          [is_secondary] => 
          [is_arbiter] => 
          [is_hidden] => 
          [is_passive] => 
          [last_is_master] => Array
              (
                  [ismaster] => 1
                  [msg] => isdbgrid
                  [maxBsonObjectSize] => 16777216
                  [maxMessageSizeBytes] => 48000000
                  [maxWriteBatchSize] => 1000
                  [localTime] => MongoDB\BSON\UTCDateTime Object
                      (
                          [milliseconds] => 1487171166812
                      )
       
                  [maxWireVersion] => 5
                  [minWireVersion] => 0
                  [ok] => 1
              )
       
          [round_trip_time] => 0
      )
      ]]
       
      [38]
      File:      /path/to/code/vendor/php/mongodb/Find.php in line 206
      Method:    MongoDB\Driver\Server->executeQuery
      Parameter: ['data.items', MongoDB\Driver\Query, MongoDB\Driver\ReadPreference]
      

      To fix the router reporting this exception on a query we found 3 ways:

      • restart router
      • flushRouterConfig
      • do a find query on this collection via mongo shell. (tested with version 3.4.2 only)

      Could it be that routers only trigger chunk information updates when a client does an operation on a collection? -> this is broken for php driver only?

      Since we can reproduce it in our dev environment with version 3.4.2, we our glad to debug this with your help.

      This is a blocker for us in production since even the balancer is off, autosplits also cause this issue.

        Attachments

        1. app-0_mongodb-router-27017.log
          2.46 MB
        2. app-1_mongodb-router-27017.log
          2.34 MB
        3. mongs-3_mongodb-shard.log
          11.46 MB
        4. mongs-4_mongodb-shard.log
          7.26 MB

          Issue Links

            Activity

              People

              Assignee:
              kelsey.schubert Kelsey T Schubert
              Reporter:
              steffen Steffen
              Participants:
              Votes:
              0 Vote for this issue
              Watchers:
              14 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: