[SERVER-22905] mongos are having different data in shared cluster Created: 01/Mar/16  Updated: 16/Nov/21  Resolved: 11/Mar/16

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 3.0.6
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Rakesh Kumar Assignee: Kelsey Schubert
Resolution: Cannot Reproduce Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
is duplicated by SERVER-22904 multiple mongos are having different ... Closed
Related
related to SERVER-8059 After movePrimary, db.getCollectionNa... Closed
Operating System: ALL
Steps To Reproduce:

We have a sharded cluster of mongo 3.0.6, having 2 shards, 3 config servers and 3 mongos.
In this cluster we are having multiple databases, eg. db1,db2 and db3.

In this cluster we are having multiple databases, eg. db1,db2 and db3

db1 > shard1 primary (not sharded)
db2 > shard1 primary (not sharded)
db3 > shard2 primary (not sharded)
mongos1 , mongos2, mongos3 > three mongos
 
On 10.14.4.121 (mongos1)
mongos> use admin
mongos> db.runCommand(
{ movePrimary: "db2", to: "shard1" }
)
mongos> db.runCommand({flushRouterConfig: 1})
mongos> use db2
switched to db db2
mongos> db.inventories.count()
140714
mongos> ^C
 
On 10.14.17.245 (mongos2)
Before restart:
mongos> db.runCommand({flushRouterConfig: 1})
mongos> use db2
switched to db db2
mongos> db.inventories.count()
28
After mongos2 Restart:
mongos> use db2
switched to db db2
mongos> db.inventories.count()
140714
 
On 10.14.2.71 (mongos3)
Before Restart:
mongos> db.runCommand({flushRouterConfig: 1})
mongos> use db2
switched to db db2
mongos> db.inventories.count()
25
After Restart:
mongos> use db2
switched to db db2
mongos> db.inventories.count()
140714

Participants:

 Description   

Hi,
We have a sharded cluster of mongo 3.0.6, having 2 shards, 3 config servers and 3 mongos.
In this cluster we are having multiple databases, eg. db1,db2 and db3

db1 > shard1 primary (not sharded)
db2 > shard2 primary (not sharded)
db3 > shard2 primary (not sharded)

After moving primary of db2 database from shard2 to shard1, after that all three mongos are showing different document counts (the one mongos from where we are running move primary command is showing right data, other two mongos are showing wrong data)



 Comments   
Comment by Kelsey Schubert [ 11/Mar/16 ]

Hi rakesh.mib.j,

I'd like to summarize what we've seen:

  • On your staging environment, after reassigning the primary shard you had to execute flushRouterConfig on all mongos instances before you could read the correct data. This is expected behavior and is documented here.
  • On your production environment, writes and reads continued while executing movePrimary. In this circumstance, it is expected that some writes could end up on the incorrect shard. This will require manual intervention to correct.

Since neither you nor I could reproduce the behavior where after executing flushRouterConfig incorrect data was still displayed, I am going to close this ticket. If you are able reliably reproduce this issue, please comment and we will be happy to reopen this ticket and take another look.

Thank you,
Thomas

Comment by Rakesh Kumar [ 08/Mar/16 ]

Hi Thomas,

I have done testing multiple times on our staging environment, and every time we are getting 0 count result on other mongos after moving primary to different shards.

Below are the test results, which are excepted:

mongos1:

mongos> use admin
switched to db admin
mongos> db.runCommand( { movePrimary: "db1", to: "shard2" } )
{
	"primary " : "shard2:shard2/10.14.1.196:27027,10.14.1.196:27028,10.14.1.196:27029",
	"ok" : 1
}
 
mongos> use db1
mongos> db.inventories.count()
145012

On Mongos2:

mongos> use db1
 
mongos> db.inventories.count()
0
 
mongos> use admin
switched to db admin
mongos> db.runCommand({flushRouterConfig: 1})
{ "flushed" : true, "ok" : 1 }
mongos> use db1
switched to db db1
mongos> db.inventories.count()
145012

On Mongos3:

mongos> use db1
switched to db db1
 
mongos> db.inventories.count()
0
 
mongos> db.runCommand({flushRouterConfig: 1})
{ "flushed" : true, "ok" : 1 }
mongos> use db1
switched to db db1
mongos>  db.inventories.count()
145012

In prod we are having read/insert/update heavy load, but here we are testing in ideal environment.

Further In Prod (where we face this issue), we found db2 exist in both shards.
One shard (where we moved the db) is having 140714 records and other (older shard) is having 25 records in inventories collection.
We are using older version of mongo (3.0.6) and latest version is 3.0.9, is it help if we upgrade our cluster to mongo 3.0.9.

Comment by Kelsey Schubert [ 03/Mar/16 ]

Hi rakesh.mib.j,

To continue to investigate this behavior we will need some more information. Are you able to reproduce this issue?

If so, can you please

  1. set the log verbosity to 2 for each mongos.
  2. execute and provide the output of db.inventories.explain().count() before flushing the router config, after flushing the router config, and after the restarting the mongos.
  3. run db.inventories.count() multiple times at each stage to ensure this is not a transient condition.

After reproducing the issue, including the new steps I have listed above, we will want the following:

  1. the logs of each of the mongos
  2. the logs of the primary of each shard

Thank you for your help,
Thomas

Comment by Rakesh Kumar [ 01/Mar/16 ]

FYI : We are having sharded cluster, but we haven't shard any database.

Generated at Thu Feb 08 04:01:46 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.