[SERVER-11256] improve handling of empty vs nonexistent CollectionMetadata Created: 17/Oct/13 Updated: 25/Nov/14 Resolved: 06/Jun/14 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | None |
| Fix Version/s: | 2.7.2 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Asya Kamsky | Assignee: | Randolph Tan |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||||||||||||||||||||||||||
| Issue Links: |
|
||||||||||||||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||||||||||||||
| Steps To Reproduce: | See attached jstest. |
||||||||||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||||||||||
| Description |
|
When a new collection is added and its data is migrated to another shard (from the primary shard), any mongos which had already authenticated a user to that DB before the collection is sharded and data move happened now cannot see this collection's data on the other shard. I think this may be related to mongod setting shard version for that collection to 0 when last chunk is migrated off of it, but that's problematic possibly in similar way we require flushRouterConfig when movePrimary is done for a database. |
| Comments |
| Comment by Githook User [ 06/Jun/14 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Author: {u'username': u'renctan', u'name': u'Randolph Tan', u'email': u'randolph@10gen.com'}Message: Rename ChunkVersion::isEquivalentTo to equals | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Githook User [ 06/Jun/14 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Author: {u'username': u'renctan', u'name': u'Randolph Tan', u'email': u'randolph@10gen.com'}Message: Remove long constructor for ChunkVersion | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Githook User [ 06/Jun/14 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Author: {u'username': u'renctan', u'name': u'Randolph Tan', u'email': u'randolph@10gen.com'}Message: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Greg Studer [ 03/Apr/14 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Yeah this is a known edge in our versioning protocol - there's poor distinction between "unsharded" and "nonexistent" - both have the same version (0|0|OID(000)). SERVER-939 is also difficult because of this. EDIT: Think this case is fixable if we explicitly distinguish internally between the absence and presence of CollectionMetadata and require contacting the config server before creating any new collections from mongos clients. Think this also requires a new sentinel chunk version for "doesn't exist" which doesn't match "unsharded" - maybe we can now use ::IGNORED()? | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Spencer Brody (Inactive) [ 02/Apr/14 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Repro'd on 2.4 also, verified this is not a regression. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Spencer Brody (Inactive) [ 02/Apr/14 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I have reproduced this and verified that the problem is not related to security. I have attached a new repro script that doesn't require authentication to be on to reproduce. The problem seems to be with when mongos refreshes its cache of the config data. If you query the database in question before doing the migration, the mongos refreshes its cache then, but doesn't update it after the migration, causing the test to fail. If you don't query the database in question at all until after the migration is successful, then when you do query it it will refresh the cache then and see the up-to-date metadata, making the test pass. I have attached logs from a passing and failing run of the repro script, with the only difference being commenting out the line that queries the collection before the migration. If you look in the logs for "DBConfig unserialize" you will see when the mongos refreshes its cache of the chunk data, and in the failing run you'll see that it never does so after the migration. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Asya Kamsky [ 18/Oct/13 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
The linked ticket has similar output to this, but in the test case I have two mongos, two shards and the following basic sequence of events:
The result is that through mongos2 the second collection is not visible. The attached test asserts on either collection stats showing sharded:false from either mongos, and also checks that count should be 10 in each collection through either mongos. The output when bork2 was created before the first login on mongos2 and bork was created after (showing correct and incorrect stats from two mongos'):
The reason for two collections is that the first collection is correctly visible from mongos2 so creating and moving collection2's data after user authenticates through mongos2 is the key to improper visibility of data. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Andy Schwerin [ 18/Oct/13 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
asya, can you clarify the description, or maybe add the expected an actual output from the failing cases in the attached test file? |