Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Duplicate
Priority: Major - P3
Fix Version/s: None
Affects Version/s: None
Component/s: Performance, Sharding
Labels:
None

Operating System:
ALL
Steps To Reproduce:
Hide

Run this to trigger repeated migrations:

var st = new ShardingTest( { shards: 2, mongos:1, other : { chunksize : 1 }}); st.stopBalancer(); db = st.getDB( "test" ); var testcoll = db.foo; st.adminCommand( { enablesharding : "test" } ); st.adminCommand( { shardcollection : "test.foo" , key : { _id : 1 } } ); var str = ""; while ( str.length < 10000 ) { str += "asdasdsdasdasdasdas"; } var data = num = 0; //Insert till you get to 10MB of data while ( data < ( 1024 * 1024 * 10 ) ) { testcoll.insert( { _id : num++ , s : str } ) data += str.length } //Flush and wait db.getLastError() var stats = st.chunkCounts( "foo" ) var to1 = ""; var to2 = ""; for ( shard in stats ){ if ( stats[shard] == 0 ) { to1 = shard } else { to2 = shard; } } while( 1 ) { var result = st.adminCommand( { movechunk : "test.foo" , find : { _id : 1 } , to : to1, _waitForDelete : true} ); assert(result, "movechunk failed: " + tojson( result ) ); sleep( 100 ); var result = st.adminCommand( { movechunk : "test.foo" , find : { _id : 1 } , to : to2, _waitForDelete : true} ); assert(result, "movechunk failed: " + tojson( result ) ); sleep( 100 ); }

Run this in a separate shell attached to the mongos started by the script above:

while( 1 ) { count = db.foo.count(); print( count ); assert.eq( 1048, count ); }

Note per the following that queries return correct counts even during migrations:

while( 1 ) { count = db.foo.find().itcount(); print( count ); assert.eq( 1048, count ); }
Show
Run this to trigger repeated migrations: var st = new ShardingTest( { shards: 2, mongos:1, other : { chunksize : 1 }}); st.stopBalancer(); db = st.getDB( "test" ); var testcoll = db.foo; st.adminCommand( { enablesharding : "test" } ); st.adminCommand( { shardcollection : "test.foo" , key : { _id : 1 } } ); var str = ""; while ( str.length < 10000 ) { str += "asdasdsdasdasdasdas" ; } var data = num = 0; //Insert till you get to 10MB of data while ( data < ( 1024 * 1024 * 10 ) ) { testcoll.insert( { _id : num++ , s : str } ) data += str.length } //Flush and wait db.getLastError() var stats = st.chunkCounts( "foo" ) var to1 = ""; var to2 = ""; for ( shard in stats ){ if ( stats[shard] == 0 ) { to1 = shard } else { to2 = shard; } } while ( 1 ) { var result = st.adminCommand( { movechunk : "test.foo" , find : { _id : 1 } , to : to1, _waitForDelete : true } ); assert (result, "movechunk failed: " + tojson( result ) ); sleep( 100 ); var result = st.adminCommand( { movechunk : "test.foo" , find : { _id : 1 } , to : to2, _waitForDelete : true } ); assert (result, "movechunk failed: " + tojson( result ) ); sleep( 100 ); } Run this in a separate shell attached to the mongos started by the script above: while ( 1 ) { count = db.foo.count(); print( count ); assert .eq( 1048, count ); } Note per the following that queries return correct counts even during migrations: while ( 1 ) { count = db.foo.find().itcount(); print( count ); assert .eq( 1048, count ); }
Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

Generally a shard only has documents belonging to chunks owned by that shard. However, if a migration is in progress the shard may additionally have documents belonging to a chunk that is not owned by the shard but is in the process of being migrated. After an aborted migration, a shard may have "orphaned" documents belonging to a chunk that never completed migration.

When a query (not count) runs on a sharded cluster, each shard checks the documents that match the query to see if they also belong to chunks owned by that shard. If they do not belong to owned chunks, they are not returned.

However when a count runs, the check to see if matching documents belong to owned chunks is not performed. This implementation has the advantage that sharded counts can use covered indexes. (Checking whether a document belongs to a valid chunk cannot currently be done from an index only - it requires loading the full document - see ~~SERVER-5022~~.) However it means sharded counts can return incorrect results. The counts on the individual shards don't filter out unowned documents that may exist because of in flight migrations or aborted migrations.

Here are some potential ways of handling this, to promt discussion:

Filter count results using a chunk manager, the same as we currently do for queries.
Give a mongod chunk manager knowledge of whether or not there may be an in flight migration or orphaned documents, and filter using a chunk manager only when necessary.
For indexes that include the shard key, add covered index support for checking chunk membership, otherwise fall back to checking the full document for chunk membership.
Track disk locations of documents participating in migrations (including aborted migrations) and exclude these documents from count, query, etc results. Filtering based on disk location does not require reading the document. If the list of disk locs is in memory only, orphans might be deleted on startup.

duplicates

SERVER-3645 Sharded collection counts (on primary) can report too many results

Closed

is duplicated by

SERVER-8178 Odd Count / Document Results Differential On Sharded Collection

Closed

Assignee:: Unassigned
Reporter:: Aaron Staple (Inactive)
Participants:: Aaron Staple, Eliot Horowitz
Votes:: 0 Vote for this issue
Watchers:: 4 Start watching this issue

Created:: Jan 30 2013 07:44:41 AM UTC
Updated:: Jun 16 2013 04:08:07 PM UTC
Resolved:: Jan 30 2013 07:51:21 AM UTC

Details

Description

Attachments

Issue Links

Forms

Activity

People

Dates