[SERVER-8870] mongos unaware of database move after movePrimary Created: 06/Mar/13  Updated: 10/Dec/14  Resolved: 01/Apr/13

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 2.2.1, 2.4.0-rc1
Fix Version/s: None

Type: Bug Priority: Critical - P2
Reporter: James Blackburn Assignee: James Wahlin
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File move_primary.js    
Issue Links:
Depends
Duplicate
duplicates SERVER-8059 After movePrimary, db.getCollectionNa... Closed
is duplicated by SERVER-16024 listCollections is missing collection... Closed
Related
is related to SERVER-7394 'movePrimary' should issue a warning ... Closed
is related to SERVER-939 Ability to distribute collections in ... Blocked
Operating System: ALL
Steps To Reproduce:
  1. Start a 3 shard cluster with 2 mongos processes (for my test it was single replica shards, 1 config server & 2 mongos processes)
  2. Attach a mongo shell to each mongos process
  3. Insert a small # of records into a new database
  4. Run find() in each shell to display records
  5. Perform a movePrimary command to move the new database to a different shard.
  6. Run find() in each mongo shell -> you will see that only the shell attached to the mongos against which movePrimary was run displays records.

Note that the mongos containing the stale shard location can be refreshed with either a restart or by running the flushRouterConfig command.

Participants:

 Description   

We moved an (unsharded) database from one shard to another using movePrimary command and following the instructions here:

http://docs.mongodb.org/manual/tutorial/remove-shards-from-cluster/#remove-shard-move-unsharded-databases

Having done that users starting complaining of unauthorized access. Sure enough connecting to their local mongos showed that the database that had been moved, and the system.users collection within it, were empty. I.e. the mongos didn't pick up the fact that the database had moved.

This is somewhat worrying, and essentially required us to restart mongos' across the cluster. This makes us worry that, if a process had auth (to admin say) they would be writing to the wrong shard for that database - and we'd experience data loss. It's also worrying that mongoses don't appear to automatically pick up changes like this.



 Comments   
Comment by Remon van Vliet [ 13/Aug/13 ]

We've seen this issue as well. Can someone explain how this is a duplicate of 8059? It seems both are symptoms of an as of yet undefined mongos state propagation issue. @James Blackburn did you create an issue to fix movePrimary?

Comment by James Wahlin [ 01/Apr/13 ]

Attaching move_primary.js test script for reproduction of this issue. Will attach to SERVER-8059 as well to make sure it is validated as part of a fix.

Comment by Eliot Horowitz (Inactive) [ 01/Apr/13 ]

same cause as SERVER-8059

Comment by James Wahlin [ 11/Mar/13 ]

Hi James,

I have reproduced the stale mongos configuration both in MongoDB 2.2.1 and 2.4.0-rc1. Note that as an alternative to restarting your mongos instances you can also run the flushRouterConfig command to bring all mongos instances up to date after a move.

One word of caution is movePrimary should only be run on a static database. There should be no write traffic to an in-transit database as the writes may be lost.

I will pass this ticket on to the 10gen development team responsible for movePrimary.

Thanks,
James

Comment by James Wahlin [ 08/Mar/13 ]

Hi James,

On further examination, this operation should work for your use case. Given you are not writing to this collection during the move process we would expect the move to succeed and for mongos instances to be aware of the change. Note that writes during the process may be lost, SERVER-7394 is meant to add a warning on this.

I am going to work to reproduce this today.

Thanks,
James

Comment by James Wahlin [ 08/Mar/13 ]

I agree James. I have linked SERVER-7394 to this ticket which requests that a warning is issued if you try to perform a movePrimary on a non-empty database. This would have at least helped prevent you from carrying out this action.

I would also like to encourage you to enter an additional SERVER ticket as type "New Feature", requesting that movePrimary be modified to allow for moving a database from one shard to another (outside of a shard decommission). If you do enter please post the ticket # here.

Thanks,
James

Comment by James Blackburn [ 07/Mar/13 ]

Ok, I see, that's a shame.

It would be nice if it were possible to movePrimary on a database after the fact. Otherwise one can't easily rebalance a cluster once the databases have been added.

Comment by James Wahlin [ 07/Mar/13 ]

Hi James,

As per the movePrimary document page, there is a note stating "Only use movePrimary when removing a shard from a sharded cluster" as well as only use when "the database does not contain any collections with data". I would avoid using this command as it does not work for your purpose. To move the collection your best bet is exporting the collection from MongoDB and then reimporting it. As you would guess this means downtime for the given collection.

Thanks,
James

Comment by James Blackburn [ 07/Mar/13 ]

Hi James,

No we didn't do removeShard, as we didn't want to remove the shard. We followed the instructions at:
http://docs.mongodb.org/manual/tutorial/remove-shards-from-cluster/#move-unsharded-data

We added a new shard, for load balancing, and wanted to move two unsharded databases (as an initial test) to the new shard to spread the load. We didn't want to remove the source shard from the cluster altogether.

Cheers,
James

Comment by James Wahlin [ 07/Mar/13 ]

Hi James,

Can you confirm that you followed all steps in the Remove Shards from an Existing Sharded Cluster instructions? This should include:

  1. Confirming the balancer is running
  2. Running removeShard against the current primary shard from a mongos shell
  3. Running movePrimary from a mongos shell
  4. Running removeShard again against the former primary shard from a mongos shell
  5. Stopping the mongod processes for the removed shard.

I want to confirm the above to make sure nothing was missed and that you ran your commands against the mongos. If there were any missed steps it could explain why the local mongos processes did not correct configuration.

For the dbstats against rs0, seeing data when connected locally is not unexpected. Data is removed from mongod processes post-migration in a lazy manner. When running a sharded cluster, you should be performing all of your CRUD operations against the mongos which will know how to route requests properly.

Thanks,
James

Comment by James Blackburn [ 07/Mar/13 ]

This seems like a pretty serious data-loss bug.

It makes us worry that re-balancing while a cluster is in use is also a dangerous thing to do. Is it?

Comment by James Blackburn [ 06/Mar/13 ]

Worse than that one of the databases we moved, which was unsharded now has a split personality:

c.config.databases.find_one({'_id':'cube'})
Out[82]: {u'_id': u'cube', u'partitioned': False, u'primary': u'rs2'}

Connected directly to rs0 (where it used to live):

In [9]: c.cube.command('dbstats')
Out[9]: 
{u'avgObjSize': 242.8252673619883,
 u'collections': 14,
 u'dataSize': 20185336,
 u'db': u'cube',
 u'fileSize': 201326592,
 u'indexSize': 5788608,
 u'indexes': 12,
 u'nsSizeMB': 16,
 u'numExtents': 32,
 u'objects': 83127,
 u'ok': 1.0,
 u'storageSize': 32354304}

On rs2, where the data now lives:

In [6]: c.cube.command('dbstats')
Out[6]: 
{u'avgObjSize': 170.42628955376935,
 u'collections': 33,
 u'dataSize': 6930992464.0,
 u'db': u'cube',
 u'fileSize': 17105420288.0,
 u'indexSize': 2594032224.0,
 u'indexes': 75,
 u'nsSizeMB': 16,
 u'numExtents': 163,
 u'objects': 40668564,
 u'ok': 1.0,
 u'storageSize': 9831714816.0}

Comment by James Blackburn [ 06/Mar/13 ]

Looks similar to: SERVER-3413
but I've seen this in 2.2.1

Comment by James Blackburn [ 06/Mar/13 ]

There's nothing interesting in the mongos log:

Wed Mar  6 17:25:04 [conn5267]  authenticate db: tadata_live { authenticate: 1, nonce: "e5caf34d4ddd3810", user: "tadata_rw", key: "b5400
abe80c2ea14ec23f010a86fcba7" }
Wed Mar  6 17:25:04 [conn5267] auth: couldn't find user tadata_rw, tadata_live.system.users
Wed Mar  6 17:25:54 [LockPinger] cluster cn53:27117,cn54:27117,cn55:27117 pinged successfully at Wed Mar  6 17:25:54 2013 by distributed 
lock pinger 'cn53:27117,cn54:27117,cn55:27117/dlonapahls254.maninvestments.com:27119:1358874575:1804289383', sleeping for 30000ms
Wed Mar  6 17:27:08 [Balancer] distributed lock 'balancer/dlonapahls254.maninvestments.com:27119:1358874575:1804289383' acquired, ts : 51
377c6bcf6b496213cb5a6a
Wed Mar  6 17:27:08 [Balancer] distributed lock 'balancer/dlonapahls254.maninvestments.com:27119:1358874575:1804289383' unlocked. 
Wed Mar  6 17:27:32 [Balancer] distributed lock 'balancer/dlonapahls254.maninvestments.com:27119:1358874575:1804289383' acquired, ts : 51
377c84cf6b496213cb5a6b
Wed Mar  6 17:27:32 [Balancer] distributed lock 'balancer/dlonapahls254.maninvestments.com:27119:1358874575:1804289383' unlocked. 

Apart from the fact that it couldn't auth the user as the tadat_live.system.users collection is empty.

Generated at Thu Feb 08 03:18:41 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.