[SERVER-11256] improve handling of empty vs nonexistent CollectionMetadata Created: 17/Oct/13  Updated: 25/Nov/14  Resolved: 06/Jun/14

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: 2.7.2

Type: Bug Priority: Major - P3
Reporter: Asya Kamsky Assignee: Randolph Tan
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File auth_shard_collection.js     File noauth_shard_collection.js     Text File testOutputFailingNoauth.txt     Text File testOutputPassingNoauth.txt    
Issue Links:
Duplicate
duplicates SERVER-10758 Strict Epoch comparison Closed
is duplicated by SERVER-13531 Collection drop doesn't clean the sha... Closed
Related
related to SERVER-7783 Audit code which finds system.users d... Closed
related to SERVER-16316 Remove unsupported behavior in shard3.js Closed
related to SERVER-939 Ability to distribute collections in ... Blocked
Tested
Operating System: ALL
Steps To Reproduce:

See attached jstest.

Participants:

 Description   

When a new collection is added and its data is migrated to another shard (from the primary shard), any mongos which had already authenticated a user to that DB before the collection is sharded and data move happened now cannot see this collection's data on the other shard.

I think this may be related to mongod setting shard version for that collection to 0 when last chunk is migrated off of it, but that's problematic possibly in similar way we require flushRouterConfig when movePrimary is done for a database.



 Comments   
Comment by Githook User [ 06/Jun/14 ]

Author:

{u'username': u'renctan', u'name': u'Randolph Tan', u'email': u'randolph@10gen.com'}

Message: SERVER-11256 improve handling of empty vs nonexistent CollectionMetadata

Rename ChunkVersion::isEquivalentTo to equals
Branch: master
https://github.com/mongodb/mongo/commit/293993b4535d32464a87e15e4abd7ae3a2eee891

Comment by Githook User [ 06/Jun/14 ]

Author:

{u'username': u'renctan', u'name': u'Randolph Tan', u'email': u'randolph@10gen.com'}

Message: SERVER-11256 improve handling of empty vs nonexistent CollectionMetadata

Remove long constructor for ChunkVersion
Branch: master
https://github.com/mongodb/mongo/commit/7915e212dc903f8d65b5c67d3c1bc501e0d3e610

Comment by Githook User [ 06/Jun/14 ]

Author:

{u'username': u'renctan', u'name': u'Randolph Tan', u'email': u'randolph@10gen.com'}

Message: SERVER-11256 improve handling of empty vs nonexistent CollectionMetadata
Branch: master
https://github.com/mongodb/mongo/commit/0d5acb0e3a6b0f1cdf7f252aa9a13afb1e884848

Comment by Greg Studer [ 03/Apr/14 ]

Yeah this is a known edge in our versioning protocol - there's poor distinction between "unsharded" and "nonexistent" - both have the same version (0|0|OID(000)). SERVER-939 is also difficult because of this.

EDIT: Think this case is fixable if we explicitly distinguish internally between the absence and presence of CollectionMetadata and require contacting the config server before creating any new collections from mongos clients. Think this also requires a new sentinel chunk version for "doesn't exist" which doesn't match "unsharded" - maybe we can now use ::IGNORED()?

Comment by Spencer Brody (Inactive) [ 02/Apr/14 ]

Repro'd on 2.4 also, verified this is not a regression.

Comment by Spencer Brody (Inactive) [ 02/Apr/14 ]

I have reproduced this and verified that the problem is not related to security. I have attached a new repro script that doesn't require authentication to be on to reproduce. The problem seems to be with when mongos refreshes its cache of the config data. If you query the database in question before doing the migration, the mongos refreshes its cache then, but doesn't update it after the migration, causing the test to fail. If you don't query the database in question at all until after the migration is successful, then when you do query it it will refresh the cache then and see the up-to-date metadata, making the test pass.

I have attached logs from a passing and failing run of the repro script, with the only difference being commenting out the line that queries the collection before the migration. If you look in the logs for "DBConfig unserialize" you will see when the mongos refreshes its cache of the chunk data, and in the failing run you'll see that it never does so after the migration.

Comment by Asya Kamsky [ 18/Oct/13 ]

The linked ticket has similar output to this, but in the test case I have two mongos, two shards and the following basic sequence of events:

  1. through mongos1 create a sharded collection on shard1
  2. through mongos1 moveChunk min-max to shard2
  3. through mongos2 authenticate to the test database
  4. through mongos1 create second sharded collection on shard1 (same db)
  5. through mongos1 moveChunk min-max to shard2
  6. through mongos1 attempt to see count and stats on collections 1 and 2
  7. through mongos2 attempt to see count and stats on collections 1 and 2

The result is that through mongos2 the second collection is not visible.

The attached test asserts on either collection stats showing sharded:false from either mongos, and also checks that count should be 10 in each collection through either mongos.

The output when bork2 was created before the first login on mongos2 and bork was created after (showing correct and incorrect stats from two mongos'):

{
	"sharded" : false,
	"primary" : "shard0001",
	"ns" : "test.bork",
	"count" : 0,
	"size" : 0,
	"storageSize" : 4096,
	"numExtents" : 1,
	"nindexes" : 1,
	"lastExtentSize" : 4096,
	"paddingFactor" : 1,
	"systemFlags" : 1,
	"userFlags" : 0,
	"totalIndexSize" : 8176,
	"indexSizes" : {
		"_id_" : 8176
	},
	"ok" : 1
}
{
	"sharded" : true,
	"ns" : "test.bork2",
	"count" : 10,
	"numExtents" : 2,
	"size" : 200,
	"storageSize" : 12288,
	"totalIndexSize" : 16352,
	"indexSizes" : {
		"_id_" : 16352
	},
	"avgObjSize" : 20,
	"nindexes" : 1,
	"nchunks" : 1,
	"shards" : {
		"shard0000" : {
			"ns" : "test.bork2",
			"count" : 10,
			"size" : 200,
			"avgObjSize" : 20,
			"storageSize" : 8192,
			"numExtents" : 1,
			"nindexes" : 1,
			"lastExtentSize" : 8192,
			"paddingFactor" : 1,
			"systemFlags" : 1,
			"userFlags" : 0,
			"totalIndexSize" : 8176,
			"indexSizes" : {
				"_id_" : 8176
			},
			"ok" : 1
		},
		"shard0001" : {
			"ns" : "test.bork2",
			"count" : 0,
			"size" : 0,
			"storageSize" : 4096,
			"numExtents" : 1,
			"nindexes" : 1,
			"lastExtentSize" : 4096,
			"paddingFactor" : 1,
			"systemFlags" : 1,
			"userFlags" : 0,
			"totalIndexSize" : 8176,
			"indexSizes" : {
				"_id_" : 8176
			},
			"ok" : 1
		}
	},
	"ok" : 1
}

The reason for two collections is that the first collection is correctly visible from mongos2 so creating and moving collection2's data after user authenticates through mongos2 is the key to improper visibility of data.

Comment by Andy Schwerin [ 18/Oct/13 ]

asya, can you clarify the description, or maybe add the expected an actual output from the failing cases in the attached test file?

Generated at Thu Feb 08 03:25:19 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.