[SERVER-10013] no rollback of collection version on soft-fail migration commit (findOne::prepare failed) Created: 24/Jun/13  Updated: 11/Jul/16  Resolved: 16/Sep/13

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 2.4.4
Fix Version/s: 2.5.3

Type: Bug Priority: Major - P3
Reporter: Greg Studer Assignee: Greg Studer
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Duplicate
Related
Operating System: ALL
Steps To Reproduce:

Fail a migration version write using findOne::prepare(), try migrating again.

Participants:

 Description   

On writing the version change to the config server in the migration critical section, sometimes the "prepare()" fails before we write the change. Since no data was changed, we can recover locally by re-donating the chunk back to the ShardChunkManager of the from-side shard. The shard version is reset correctly, but the collection version is not, resulting in further chunk diffs failing and the version locked to the last-migrated version.

No impact on non-metadata operations, since the shard version and chunks are correct, but further migrations cannot proceed.

Affects version 2.4 since previously this resulted in immediate shutdown. Unsure yet if the same problem affects master.



 Comments   
Comment by auto [ 27/Aug/13 ]

Author:

{u'username': u'gregstuder', u'name': u'Greg Studer', u'email': u'greg@10gen.com'}

Message: SERVER-10013 use old manager for chunk version rollback on migration fail
Branch: master
https://github.com/mongodb/mongo/commit/fab3e0c8588d206ff93bbf083a0e773e9bdeacc4

Comment by Greg Studer [ 25/Jun/13 ]

Repro:

    TEST_F(NoChunkFixture, ChunkRollbackWorks) {
        ChunkType chunk;
        chunk.setMin( BSON("a" << 10) );
        chunk.setMax( BSON("a" << 20) );
 
        // Initial chunk added
        string errMsg;
        scoped_ptr<CollectionMetadata> cloned( getCollMetadata().clonePlus( chunk,
                                                                            ChunkVersion( 1,
                                                                                          0,
                                                                                          OID() ),
                                                                            &errMsg ) );
        ASSERT( cloned.get() != NULL );
 
        // Chunk added that we will later move off
        chunk.setMin( BSON("a" << 30) );
        chunk.setMax( BSON("a" << 40) );
        cloned.reset( cloned->clonePlus( chunk, ChunkVersion( 2, 0, OID() ), &errMsg ) );
 
        ASSERT( cloned.get() != NULL );
        ASSERT_EQUALS( cloned->getShardVersion().majorVersion(), 2 );
        ASSERT_EQUALS( cloned->getCollVersion().majorVersion(), 2 );
 
        // Move off the chunk
        cloned.reset( cloned->cloneMinus( chunk, ChunkVersion( 3, 0, OID() ), &errMsg ) );
 
        ASSERT( cloned.get() != NULL );
        ASSERT_EQUALS( cloned->getShardVersion().majorVersion(), 3 );
        ASSERT_EQUALS( cloned->getCollVersion().majorVersion(), 3 );
 
        // Undo the move
        cloned.reset( cloned->clonePlus( chunk, ChunkVersion( 2, 0, OID() ), &errMsg ) );
 
        ASSERT( cloned.get() != NULL );
        ASSERT_EQUALS( cloned->getShardVersion().majorVersion(), 2 );
        // Fails, the coll version is still 3
        ASSERT_EQUALS( cloned->getCollVersion().majorVersion(), 2 );
    }

Generated at Thu Feb 08 03:22:02 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.