-
Type: Bug
-
Resolution: Works as Designed
-
Priority: Major - P3
-
None
-
Affects Version/s: 3.6.8
-
Component/s: Sharding
-
Labels:None
-
ALL
We have a production sharded cluster recently upgraded from 3.4 to 3.6. It was initially with 4 shards with 1 primary and 1 secondary each with 5 mongos routers supporting it. We scaled it up to add 2 more shards. So now there are 6 shards with 1 primary and 1 secondary each. It was time to rebalance the collections leading to moving of some chunks from 4 shards originally present to these 2 newly added shards.
There were lots of cases after this rebalancing, where the data was not found when some of the mongos routers were used with secondaryPreferred option. But if you look actually into the db the data is present. So after issuing flushRouterConfig on all the mongos shell, the things were back to normal.
But this is not reliable as the balancing will take place with the data coming in and getting modified. it is critical for all mongos to be updated with the change in the data distribution.
Also we use Route53 setting of AWS where a name 'prod-mongos' points to multiple mongos. So with 'prod-mongos' pointing 2 mongos x and y following experiments were ran using php
Step 1 :
With 4 consecutive executions result was returned in some case and not in other
connection string : $conn = new MongoDB\Client( 'mongodb://prod-mongos:27017', array( "readPreference" => "secondaryPreferred", "socketTimeoutMS" => 5000));
$ php balancer_bug_test.php -i id
$ php balancer_bug_test.php -i id
MongoDB\Model\BSONDocument Object
(
{{ [storage:ArrayObject:private] => Array}}
{{ (}}
{{ record details}}
{{ )}}
)
$ php balancer_bug_test.php -i id
$ php balancer_bug_test.php -i id
MongoDB\Model\BSONDocument Object
(
{{ [storage:ArrayObject:private] => Array}}
{{ (}}
{{ record details}}
{{ )}}
)
Step 2: 'prod-mongos' pointing only mongos x
$ php balancer_bug_test.php -i id
$ php balancer_bug_test.php -i id
$ php balancer_bug_test.php -i id
$ php balancer_bug_test.php -i id
No result
Step 3: connected to mongos x shell and executed
db.adminCommand("flushRouterConfig")
$ php balancer_bug_test.php -i id
MongoDB\Model\BSONDocument Object
(
{{ [storage:ArrayObject:private] => Array}}
{{ (}}
{{ record details}}
{{ )}}
)
$ php balancer_bug_test.php -i id
MongoDB\Model\BSONDocument Object
(
{{ [storage:ArrayObject:private] => Array}}
{{ (}}
{{ record details}}
{{ )}}
)
$ php balancer_bug_test.php -i id
MongoDB\Model\BSONDocument Object
(
{{ [storage:ArrayObject:private] => Array}}
{{ (}}
{{ record details}}
{{ )}}
)
$ php balancer_bug_test.php -i id
MongoDB\Model\BSONDocument Object
(
{{ [storage:ArrayObject:private] => Array}}
{{ (}}
{{ record details}}
{{ )}}
)
Result was obtained each time.
Step 4 : To rule out possibility of driver issue, similar experiments were run using python and pymongo. The observation were exactly the same
Step 5: Similar commands were tried from the mongos using db.getMongo().setReadPref('secondaryPreferred').
Result was similar. Data was not fetched correctly unless 'flushRouterConfig' was used
Version details :
Mongo DB : MongoDB shell version v3.6.8
{{Mongo Config server OS : CentOS release 6.10 (Final) }}
{{Mongos OS : CentOS release 6.9 (Final) }}
Linux 2.6.32-696.13.2.el6.x86_64
{{Mongo DB Shards and Replica Set hosts OS : }}
CentOS Linux release 7.5.1804 (Core) Linux 4.18.1-1.el7.elrepo.x86_64
Reference : https://jira.mongodb.org/browse/SERVER-5931 We thought this would be fixed in version 3.6.8