[SERVER-12025] Chunk move from 2.4 shard to 2.6 shard fails when doc has illegal field name ($ prefix) Created: 10/Dec/13  Updated: 10/Dec/14  Resolved: 31/Jan/14

Status: Closed
Project: Core Server
Component/s: Sharding, Write Ops
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Michael O'Brien Assignee: Scott Hernandez (Inactive)
Resolution: Won't Fix Votes: 0
Labels: 26qa
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
Backwards Compatibility: Fully Compatible
Operating System: ALL
Participants:

 Description   
  • create a sharded cluster with a 2.4 shard and a 2.6 shard
  • insert some data onto the 2.4 shard with $prefix fields (valid for 2.4, but not 2.5.x/2.6)
  • moveChunk for that chunk from the 2.4 shard to the 2.6 shard will fail

code to repro:

//createShardedCollection.js
createShardedCollection = function(admin, coll, insertData, preSplit, shards){
    if(insertData && preSplit){
        for(var i=-400;i<400;i++){
            coll.insert({_id:i, x:i});
        }
    }
    print("creating a sharded collection");
    printjson(admin.runCommand({ enableSharding : coll.getDB() + "" }));
    printjson(admin.runCommand({ movePrimary : coll.getDB() + "", to : shards[0]._id }));
    printjson(admin.runCommand({ shardCollection : coll + "", key : { _id : 1 } }));
    printjson(admin.runCommand({ split : coll + "", middle : { _id : 0 } }));    
    printjson(admin.runCommand({ moveChunk : coll + "", find : { _id : 0 }, to : shards[1]._id }));
    printjson(admin.runCommand({ split : coll + "", middle : { _id : -300 } }));
    printjson(admin.runCommand({ split : coll + "", middle : { _id : -200 } }));
    printjson(admin.runCommand({ split : coll + "", middle : { _id : -100 } }));
    printjson(admin.runCommand({ split : coll + "", middle : { _id : 100 } }));
    printjson(admin.runCommand({ split : coll + "", middle : { _id : 200 } }));
    printjson(admin.runCommand({ split : coll + "", middle : { _id : 300 } }));
    if(insertData && !preSplit){
        for(var i=-400;i<400;i++){
            coll.insert({_id:i, x:i});
        }
    }
}
 
//shards_prefix.js
load('createShardedCollection.js')
 
var st = new ShardingTest({shards:2, other:{shardOptions : {binVersion:MongoRunner.versionIterator(["2.4","2.6"])}, rs:false}})
 
 
var shards = st.s0.getDB("config").shards.find().toArray();
createShardedCollection(st.s0.getDB("admin"), st.s0.getDB("test").test, true, true, shards)
 
st.s0.getDB("test").test.find()
//var d = MongoRunner.runMongod({binVersion:"2.6"})
//
 
printjson(st.s0.getDB("test").test.update({_id:1},{$rename:{"x":"$x"}}) )
 
result = st.s0.getDB("admin").runCommand({ moveChunk : "test.test", find : { _id : 1 }, to : shards[0]._id })
printjson(result)
assert(result.ok == 0)

snip of relevant log output:

 m30999| 2013-12-10T10:58:01.950-0500 [conn1] CMD: movechunk: { moveChunk: "test.test", find: { _id: 1.0 }, to: "shard0000" }
 m30999| 2013-12-10T10:58:01.950-0500 [conn1] moving chunk ns: test.test moving ( ns: test.test, shard: shard0001:michaels-mbp.home:30001, lastmod: 2|8||000000000000000000000000, min: { _id: 0.0 }, max: { _id: 100.0 }) shard0001:michaels-mbp.home:30001 -> shard0000:michaels-mbp.home:30000
 m30001| Tue Dec 10 10:58:01.950 [conn4] received moveChunk request: { moveChunk: "test.test", from: "michaels-mbp.home:30001", to: "michaels-mbp.home:30000", fromShard: "shard0001", toShard: "shard0000", min: { _id: 0.0 }, max: { _id: 100.0 }, maxChunkSizeBytes: 52428800, shardId: "test.test-_id_0.0", configdb: "michaels-mbp.home:29000", secondaryThrottle: false, waitForDelete: false, maxTimeMS: 0 }
 m30001| Tue Dec 10 10:58:01.951 [conn4] distributed lock 'test.test/michaels-mbp.home:30001:1386691081:991067721' acquired, ts : 52a73a0952b0f1c3bf686413
 m30001| Tue Dec 10 10:58:01.951 [conn4] about to log metadata event: { _id: "michaels-mbp.home-2013-12-10T15:58:01-52a73a0952b0f1c3bf686414", server: "michaels-mbp.home", clientAddr: "192.168.1.181:59552", time: new Date(1386691081951), what: "moveChunk.start", ns: "test.test", details: { min: { _id: 0.0 }, max: { _id: 100.0 }, from: "shard0001", to: "shard0000" } }
 m30001| Tue Dec 10 10:58:01.952 [conn4] moveChunk request accepted at version 2|13||52a73a096e564fa0e886c427
 m30001| Tue Dec 10 10:58:01.952 [conn4] moveChunk number of documents: 100
 m30000| 2013-12-10T10:58:01.952-0500 [conn6] remotely refreshing metadata for test.test based on current shard version 2|7||52a73a096e564fa0e886c427, current metadata version is 2|7||52a73a096e564fa0e886c427
 m30000| 2013-12-10T10:58:01.953-0500 [conn6] loaded newer metadata for test.test (loaded metadata version : 2|13||52a73a096e564fa0e886c427, stored version : 2|7||52a73a096e564fa0e886c427, took 0ms)
 m30000| 2013-12-10T10:58:01.953-0500 [migrateThread] starting receiving-end of migration of chunk { _id: 0.0 } -> { _id: 100.0 } for collection test.test from michaels-mbp.home:30001 at epoch 52a73a096e564fa0e886c427
 m30001| Tue Dec 10 10:58:01.954 [conn4] moveChunk data transfer progress: { active: true, ns: "test.test", from: "michaels-mbp.home:30001", min: { _id: 0.0 }, max: { _id: 100.0 }, shardKeyPattern: { _id: 1.0 }, state: "clone", counts: { cloned: 1, clonedBytes: 29, catchup: 0, steady: 0 }, ok: 1.0 } my mem used: 0
 m30001| Tue Dec 10 10:58:01.957 [conn4] moveChunk data transfer progress: { active: true, ns: "test.test", from: "michaels-mbp.home:30001", min: { _id: 0.0 }, max: { _id: 100.0 }, shardKeyPattern: { _id: 1.0 }, state: "clone", counts: { cloned: 1, clonedBytes: 29, catchup: 0, steady: 0 }, ok: 1.0 } my mem used: 0
 m30000| 2013-12-10T10:58:01.957-0500 [migrateThread] scoped connection to michaels-mbp.home:30001 not being returned to the pool
 m30000| 2013-12-10T10:58:01.957-0500 [migrateThread] about to log metadata event: { _id: "michaels-mbp.home-2013-12-10T15:58:01-52a73a09bad5abe65af1670f", server: "michaels-mbp.home", clientAddr: ":27017", time: new Date(1386691081957), what: "moveChunk.to", ns: "test.test", details: { min: { _id: 0.0 }, max: { _id: 100.0 }, step 1 of 5: 0, step 2 of 5: 0, note: "aborted" } }
 m30000| 2013-12-10T10:58:01.957-0500 [migrateThread] ERROR: migrate failed: Unknown modifier: $x
 m30001| Tue Dec 10 10:58:01.959 [conn5] end connection 192.168.1.181:59556 (6 connections now open)
 m30001| Tue Dec 10 10:58:01.961 [conn4] moveChunk data transfer progress: { active: false, ns: "test.test", from: "michaels-mbp.home:30001", min: { _id: 0.0 }, max: { _id: 100.0 }, shardKeyPattern: { _id: 1.0 }, state: "fail", errmsg: "Unknown modifier: $x", counts: { cloned: 1, clonedBytes: 29, catchup: 0, steady: 0 }, ok: 1.0 } my mem used: 0
 m30001| Tue Dec 10 10:58:01.961 [conn4] warning: moveChunk error transferring data caused migration abort: { active: false, ns: "test.test", from: "michaels-mbp.home:30001", min: { _id: 0.0 }, max: { _id: 100.0 }, shardKeyPattern: { _id: 1.0 }, state: "fail", errmsg: "Unknown modifier: $x", counts: { cloned: 1, clonedBytes: 29, catchup: 0, steady: 0 }, ok: 1.0 }
 m30001| Tue Dec 10 10:58:01.961 [conn4] MigrateFromStatus::done About to acquire global write lock to exit critical section
 m30001| Tue Dec 10 10:58:01.961 [conn4] MigrateFromStatus::done Global lock acquired
 m30001| Tue Dec 10 10:58:01.962 [conn4] distributed lock 'test.test/michaels-mbp.home:30001:1386691081:991067721' unlocked.
 m30001| Tue Dec 10 10:58:01.962 [conn4] about to log metadata event: { _id: "michaels-mbp.home-2013-12-10T15:58:01-52a73a0952b0f1c3bf686415", server: "michaels-mbp.home", clientAddr: "192.168.1.181:59552", time: new Date(1386691081962), what: "moveChunk.from", ns: "test.test", details: { min: { _id: 0.0 }, max: { _id: 100.0 }, step1 of 6: 0, step2 of 6: 1, step3 of 6: 1, note: "aborted" } }
 m30999| 2013-12-10T10:58:01.962-0500 [conn1] moveChunk result: { cause: { active: false, ns: "test.test", from: "michaels-mbp.home:30001", min: { _id: 0.0 }, max: { _id: 100.0 }, shardKeyPattern: { _id: 1.0 }, state: "fail", errmsg: "Unknown modifier: $x", counts: { cloned: 1, clonedBytes: 29, catchup: 0, steady: 0 }, ok: 1.0 }, ok: 0.0, errmsg: "data transfer error" }
{
	"cause" : {
		"cause" : {
			"active" : false,
			"ns" : "test.test",
			"from" : "michaels-mbp.home:30001",
			"min" : {
				"_id" : 0
			},
			"max" : {
				"_id" : 100
			},
			"shardKeyPattern" : {
				"_id" : 1
			},
			"state" : "fail",
			"errmsg" : "Unknown modifier: $x",
			"counts" : {
				"cloned" : NumberLong(1),
				"clonedBytes" : NumberLong(29),
				"catchup" : NumberLong(0),
				"steady" : NumberLong(0)
			},
			"ok" : 1
		},
		"ok" : 0,
		"errmsg" : "data transfer error"
	},
	"ok" : 0,
	"errmsg" : "move failed"
}
 m29000| 2013-12-10T10:58:01.963-0500 [signalProcessingThread] got signal 15 (Terminated: 15), will terminate after current cmd ends
 m29000| 2013-12-10T10:58:01.963-0500 [signalProcessingThread] now exiting



 Comments   
Comment by Daniel Pasette (Inactive) [ 31/Jan/14 ]

This is not a regression in 2.6. moveChunk would exhibit the same behavior in a pure 2.4 cluster.

The thought behind this ticket is good (that existing data needs to be able to migrate in the face of storage validation reqs), but the fix is too invasive to merit introducing this behavior now.

Comment by Scott Hernandez (Inactive) [ 04/Jan/14 ]

Can you change the test to have a non-$pref field first? This error looks like the problem is that the update is being interpreted not as an insert/replacement but $modifier update. In that case this will never work without serious changes to the infrastructure.

Here is an example to show what is happening, and the two different cases:

> db.a.update({_id:1}, {$x:1}) // bad and won't work
...			"errmsg" : "Unknown modifier: $foo"
> db.a.update({_id:1}, {a:1, $x:1}) // should work during replication/migration
...			"errmsg" : "$foo is not valid for storage."

Comment by Michael O'Brien [ 02/Jan/14 ]

This is still happening for me in the latest nightly (2014-01-02, git version: 419be507ff3ccafd1a5dcda9a40144ed6fb4901e)
Except the error message now looks a little different:

 m30999| 2014-01-02T13:57:40.907-0500 [conn1] moveChunk result: { cause: { active: false, ns: "test.prefix", from: "localhost:30000", min: { _id: 0.0 }, max: { _id: 100.0 }, shardKeyPattern: { _id: 1.0 }, state: "fail", errmsg: "Invalid modifier specified: $x", counts: { cloned: 0, clonedBytes: 0, catchup: 0, steady: 0 }, ok: 1.0 }, ok: 0.0, errmsg: "data transfer error" }
assert: [0] != [1] are not equal : undefined
Error: Printing Stack Trace
    at printStackTrace (src/mongo/shell/utils.js:38:15)
    at doassert (src/mongo/shell/assert.js:6:5)
    at Function.assert.eq (src/mongo/shell/assert.js:32:5)
    at test_movechunk.js:63:8

Comment by Scott Hernandez (Inactive) [ 12/Dec/13 ]

This was fixed in the refectoring in this commit: https://github.com/mongodb/mongo/commit/b98712c551e8ab27c33e1a5e7c694fa36c3334ce

We still need to add regression tests in QA for it though.

Comment by Michael O'Brien [ 12/Dec/13 ]

Based on a conversation with scotthernandez we had concluded that any data once it's already in the system must be capable of being moved around through migrations or replicated, otherwise it's a bug.

Comment by Daniel Pasette (Inactive) [ 12/Dec/13 ]

This appears to work as designed and appears to handle the error appropriately. This error would occur between two 2.6 shards as well, no?

Generated at Thu Feb 08 03:27:23 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.