[SERVER-4185] Assertion 13388: shard version not ok in Client::Context: client in sharded mode, but doesn't have version set for this collection Created: 01/Nov/11  Updated: 16/Nov/21  Resolved: 29/Nov/11

Status: Closed
Project: Core Server
Component/s: MapReduce, Sharding
Affects Version/s: None
Fix Version/s: 2.1.0

Type: Bug Priority: Major - P3
Reporter: Zeph Wang Assignee: Greg Studer
Resolution: Done Votes: 2
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Ubuntu 10.04
MongoDB 2.0.1


Attachments: JPEG File db_version.jpg     Zip Archive mongos_mxenzai.zip     Zip Archive mongos_mxenzai_flush.zip     PNG File screen-capture.png     File shard_1_repl_1_.log    
Issue Links:
Depends
depends on SERVER-4158 version error in mapreduce aborts ope... Closed
Related
related to SERVER-8872 error 13388 shard version not ok in C... Closed
Operating System: Linux
Participants:

 Description   

I'm seeing a lot of the following errors in all my primary shards. This make the log file grow like crazy.

"shard version not ok in Client::Context: client in sharded mode, but doesn't have version set for this collection"



 Comments   
Comment by Kay Agahd [ 06/Mar/13 ]

Done Ian, please see:
https://jira.mongodb.org/browse/SERVER-8872

Comment by Ian Whalen (Inactive) [ 06/Mar/13 ]

kay.agahd@idealo.de please open a new issue with your repro steps + any relevant info.

Comment by Kay Agahd [ 06/Mar/13 ]

Please reopen this issue. We are running v2.2.3 (Linux64) and still encounter the bug:

mongos> db.adminCommand( "flushRouterConfig" )
{ "flushed" : true, "ok" : 1 }
mongos> db.offer.count({"smallPicture.0":{$exists:true}})
Wed Mar  6 18:03:31 uncaught exception: count failed: {
	"shards" : {
		
	},
	"cause" : {
		"errmsg" : "13388 [offerStore.offer] shard version not ok in Client::Context: version mismatch detected for offerStore.offer, stored major version 5788 does not match received 5787 ( ns : offerStore.offer, received : 5787|0||000000000000000000000000, wanted : 5788|0||000000000000000000000000, send )",
		"ok" : 0
	},
	"ok" : 0,
	"errmsg" : "failed on : offerStoreDE2"
}

Comment by Van Nguyen [ 28/Aug/12 ]

This was my map function:

function() { emit(

{ d: new NumberInt(this.d), v: this.bi.toString() }

, this.bi); };

Some of the document has d = null so the map failed and I got the same problem.

Fixed by:

function() { if (this.d != null) emit(

{ d: new NumberInt(this.d), v: this.bi.toString() }

, this.bi); };

Hope this help someone

Comment by Brian Johnson [ 13/Jun/12 ]

what impact does flushRouterConfig have on performance? We have resorted to running it in a cron job every 15 min, but sometimes we still see the problem and we are considering running it more often.

Comment by Y. Wayne Huang [ 13/Apr/12 ]

we are seeing sporadic MR job failures with this assert in 2.0.4. would be great to consider a backport to 2.0. it's not entirely clear how to work around the issue. sometimes flushRouterConfig works, sometimes not. any advice if you do not plan to backport this fix?

Comment by Greg Studer [ 29/Nov/11 ]

@Zac - can track SERVER-4387 for the fix for the retry logic of commands, closing as the original issue in the ticket has been identified and the other issues are becoming difficult to track. Feel free to reopen another SERVER or SUPPORT ticket for non-mapreduce related issues you're having.

Comment by Greg Studer [ 22/Nov/11 ]

thanks for the info, few things here..

> Tue Nov 22 21:51:56 [conn56] warning: bad serverID set in setShardVersion and none in info: EOO
this is the issue in SERVER-4020, fixed in 2.0.2

> Tue Nov 22 21:51:48 [conn1] retrying command:

{ collstats: "hourly_stats", scale: undefined }

Caused by the retry logic of command (recently fixed, am backporting). "foo" is from our code, somehow slipped into the msg there (fixed for some time in master), but will backport that too.

> Tue Nov 22 21:51:48 [conn5] Assertion: 13388:[foobar.hourly_stats] shard version not ok in Client::Context: collection was dropped or this shard no longer valid version: 0|0 clientVersion: 21|151
known issue is SERVER-4262, one workaround is to append a unique value to your hourly collection

Comment by Zac Witte [ 22/Nov/11 ]

I'm getting the same exception. I had a sharded collection which I dropped, then it was implicitly created again, unsharded, on the primary node with an upsert. So far this is the first time I've encountered this error. The drop and the upsert happend through the same mongos without any restarts. Running 3 shards all on 2.0.1, no replicas.

This is what I saw in my mongod log
Tue Nov 22 21:34:46 [LockPinger] cluster mongo2.foobar.com:27019 pinged successfully at Tue Nov 22 21:34:46 2011 by distributed lock pinger 'mongo2.foobar.com:270
19/mongo1:27018:1321918808:410267965', sleeping for 30000ms
Tue Nov 22 21:36:11 [conn4] command admin.$cmd command:

{ writebacklisten: ObjectId('4ecadf2cbc48751bda77e696') }

ntoreturn:1 reslen:44 300000ms
Tue Nov 22 21:36:56 [initandlisten] connection accepted from 184.72.48.1:49693 #249
Tue Nov 22 21:36:56 [conn249] CMD: drop foobar.hourly_stats
Tue Nov 22 21:36:56 [conn249] wiping data for: foobar.hourly_stats
Tue Nov 22 21:37:03 [conn249] CMD: drop foobar.hourly_customer_stats
Tue Nov 22 21:37:03 [conn249] wiping data for: foobar.hourly_customer_stats
Tue Nov 22 21:37:30 [conn27] command admin.$cmd command:

{ writebacklisten: ObjectId('4ecae08b28dcca171fa71c57') }

ntoreturn:1 reslen:44 300000ms

(about 10 minutes go by, very little activity)

Tue Nov 22 21:49:47 [LockPinger] cluster mongo2.foobar.com:27019 pinged successfully at Tue Nov 22 21:49:47 2011 by distributed lock pinger 'mongo2.foobar.com:27019/mongo1:27018:1321918808:410267965', sleeping for 30000ms
Tue Nov 22 21:50:41 [initandlisten] connection accepted from 184.72.48.1:38083 #253
Tue Nov 22 21:50:41 [conn253] warning: bad serverID set in setShardVersion and none in info: EOO
Tue Nov 22 21:50:42 [conn253] build index foobar.hourly_stats

{ _id: 1 }

Tue Nov 22 21:50:42 [conn253] build index done 0 records 0.038 secs
Tue Nov 22 21:50:43 [conn253] build index foobar.hourly_customer_stats

{ _id: 1 }

Tue Nov 22 21:50:43 [conn253] build index done 0 records 0.001 secs
Tue Nov 22 21:51:11 [conn4] command admin.$cmd command:

{ writebacklisten: ObjectId('4ecadf2cbc48751bda77e696') }

ntoreturn:1 reslen:44 300000ms
Tue Nov 22 21:51:48 [conn5] Assertion: 13388:[foobar.hourly_stats] shard version not ok in Client::Context: collection was dropped or this shard no longer valid version: 0|0 clientVersion: 21|151
Tue Nov 22 21:51:48 [conn5] Assertion: 13388:[foobar.hourly_stats] shard version not ok in Client::Context: collection was dropped or this shard no longer valid version: 0|0 clientVersion: 21|151

And this is what I saw in my mongos log (same machine). Where it says put [foo] on.... I have no idea where it's getting "foo" from. That's not what the collection or database is called.

Tue Nov 22 21:51:48 [conn1] retrying command:

{ collstats: "hourly_stats", scale: undefined }

Tue Nov 22 21:51:48 [conn1] couldn't find database [foo] in config db
Tue Nov 22 21:51:48 [conn1] put [foo] on: shard0000:mongo1.foobar.com:27018
Tue Nov 22 21:51:48 [conn1] retrying command:

{ collstats: "hourly_stats", scale: undefined }

Tue Nov 22 21:51:48 [conn1] retrying command:

{ collstats: "hourly_stats", scale: undefined }

Tue Nov 22 21:51:48 [conn1] retrying command:

{ collstats: "hourly_stats", scale: undefined }

Tue Nov 22 21:51:48 [conn1] retrying command:

{ collstats: "hourly_stats", scale: undefined }

Tue Nov 22 21:51:48 [conn1] ns: foo command failed because of stale config attempt: 0
Tue Nov 22 21:51:48 [conn1] retrying command:

{ collstats: "hourly_stats", scale: undefined }

Tue Nov 22 21:51:48 [conn1] retrying command:

{ collstats: "hourly_stats", scale: undefined }

Tue Nov 22 21:51:48 [conn1] retrying command:

{ collstats: "hourly_stats", scale: undefined }

Tue Nov 22 21:51:48 [conn1] retrying command:

{ collstats: "hourly_stats", scale: undefined }

Tue Nov 22 21:51:48 [conn1] retrying command:

{ collstats: "hourly_stats", scale: undefined }

Tue Nov 22 21:51:48 [conn1] ns: foo command failed because of stale config attempt: 1
Tue Nov 22 21:51:49 [conn1] retrying command:

{ collstats: "hourly_stats", scale: undefined }

Tue Nov 22 21:51:49 [conn1] retrying command:

{ collstats: "hourly_stats", scale: undefined }

Tue Nov 22 21:51:49 [conn1] retrying command:

{ collstats: "hourly_stats", scale: undefined }

Tue Nov 22 21:51:49 [conn1] retrying command:

{ collstats: "hourly_stats", scale: undefined }

Tue Nov 22 21:51:49 [conn1] retrying command:

{ collstats: "hourly_stats", scale: undefined }

Tue Nov 22 21:51:49 [conn1] ns: foo command failed because of stale config attempt: 2
Tue Nov 22 21:51:51 [conn1] retrying command:

{ collstats: "hourly_stats", scale: undefined }

Tue Nov 22 21:51:51 [conn1] retrying command:

{ collstats: "hourly_stats", scale: undefined }

Tue Nov 22 21:51:51 [conn1] retrying command:

{ collstats: "hourly_stats", scale: undefined }

Tue Nov 22 21:51:51 [conn1] retrying command:

{ collstats: "hourly_stats", scale: undefined }

Tue Nov 22 21:51:51 [conn1] retrying command:

{ collstats: "hourly_stats", scale: undefined }

Tue Nov 22 21:51:51 [conn1] ns: foo command failed because of stale config attempt: 3
Tue Nov 22 21:51:54 [conn1] retrying command:

{ collstats: "hourly_stats", scale: undefined }

Tue Nov 22 21:51:54 [conn1] retrying command:

{ collstats: "hourly_stats", scale: undefined }

Tue Nov 22 21:51:54 [conn1] retrying command:

{ collstats: "hourly_stats", scale: undefined }

Tue Nov 22 21:51:54 [conn1] retrying command:

{ collstats: "hourly_stats", scale: undefined }

Tue Nov 22 21:51:54 [conn1] retrying command:

{ collstats: "hourly_stats", scale: undefined }

Tue Nov 22 21:51:54 [conn1] ns: foo command failed because of stale config attempt: 4
Tue Nov 22 21:51:58 [conn1] retrying command:

{ collstats: "hourly_stats", scale: undefined }

Tue Nov 22 21:51:58 [conn1] retrying command:

{ collstats: "hourly_stats", scale: undefined }

Tue Nov 22 21:51:58 [conn1] retrying command:

{ collstats: "hourly_stats", scale: undefined }

Tue Nov 22 21:51:58 [conn1] retrying command:

{ collstats: "hourly_stats", scale: undefined }

Tue Nov 22 21:51:58 [conn1] retrying command:

{ collstats: "hourly_stats", scale: undefined }

Tue Nov 22 21:51:58 [conn1] ns: foo command failed because of stale config attempt: 5
^CTue Nov 22 21:52:17 [mongosMain] dbexit: received signal 2 rc:0 received signal 2

After restarting all servers and removing the collection again, it was re-created and the upserts completed as expected.

Also, I'm also seeing a lot of messages like the following in the logs of all my mongod servers including the config server. I read in the google group not to worry about it, but it still feels very disconcerting. Not sure if it's related or not.
Tue Nov 22 21:51:56 [conn56] warning: bad serverID set in setShardVersion and none in info: EOO

Comment by Greg Studer [ 10/Nov/11 ]

Correction - are you +still seeing this repeatedly?

Comment by Greg Studer [ 10/Nov/11 ]

Are you seeing this repeatedly? It's possible in normal operation for the config data to become stale, but this should be rare and quickly corrected by the mongos. Do you have the mongos logs for the same period?

Comment by Greg Studer [ 10/Nov/11 ]

Thanks for the info, looking now.

Comment by Zeph Wang [ 10/Nov/11 ]

log file attached!

Comment by Zeph Wang [ 10/Nov/11 ]

Stress test environment:

  • 4 physical servers, each have 48G memory.
  • mongodb version2.0.1, sharded 4 ways, each have 3 replica.
  • Each server has one mongod and an application server running which connects to the local mongod.
  • my stress test app read from the application server with 1500 parallel threads.
  • During the test and probably before the error occur, I manually ran a couple of "db.stepDown()" to switch primaries.
Comment by Zeph Wang [ 10/Nov/11 ]

Greg,
I can reproduce this error by running a read-only stress test, therefore, mapreduce is not the only way to trigger it.
I'll try to attach my mongod logfile here to this bug. The log file is pretty big as I used '-vv'. In the log file you can find where the very first error message of this type occurs. I hope this can help you debug the issue.

Comment by Greg Studer [ 08/Nov/11 ]

Ok, since flushRouterConfig is working, then the issue is being fixed right now, can watch SERVER-2396.

Comment by Yukihiko Sawanobori [ 08/Nov/11 ]

I have tried flushRouterConfig before running m/r, and it works so far.
Thank you for your advice.

I attached files again. Please refer it.

"mongos_mxenzai_flush.zip"

  • mongo_router.log (FlushRouterConfig was performed at 29 and 44 minutes hourly.)
  • config.changelog
Comment by Yukihiko Sawanobori [ 08/Nov/11 ]

I'm trying now. I'll report later.

Comment by Greg Studer [ 08/Nov/11 ]

Assume the workaround didn't fix the failing m/r?

Comment by Yukihiko Sawanobori [ 08/Nov/11 ]

"mongos_mxenzai.zip"

Include:
mongo_router.log (over 200MB),
config.changelog

I updated all mongod 1.8.3 to 2.0.1 at Nov 02 2011. Around 2:00(+9:00).

Comment by Yukihiko Sawanobori [ 08/Nov/11 ]

Thanks for your reply.

The workaround you've proposed sounds good so far.
It's much better than mine that reboots mongos before every m/r.

I will try to run flushRouterConfig before every m/r and attach log and config later.

Comment by Greg Studer [ 07/Nov/11 ]

Is there a query associated with your m/r job, or are you m/r'ing over the entire collection?

A workaround, if the versioning is the problem, is to run flushRouterConfig : 1 before running mapReduce - the impact should be pretty small. If this fixes the problem, the issue is no retrying of stale chunk versions for m/r, which is fixed in 2.1 (and may be backported). If it doesn't fix the failing m/rs, could you post the full (or more) of the mongos log, as well as the config.changelog collection (via mongos: use config; db.changelog.find().toArray()).

Comment by Yukihiko Sawanobori [ 07/Nov/11 ]

Thanks for your reply.

Shardingstatus like below.

mongos> printShardingStatus()
--- Sharding Status --- 
  sharding version: { "_id" : 1, "version" : 3 }
  shards:
	{  "_id" : "g01",  "host" : "g01/10.10.2.107:27017,10.10.2.108:27017" }
	{  "_id" : "g02",  "host" : "g02/10.10.2.108:37017,10.10.2.107:37017" }
	{  "_id" : "g03",  "host" : "g03/10.10.2.109:27017,10.10.2.110:27017" }
	{  "_id" : "g04",  "host" : "g04/10.10.2.110:37017,10.10.2.109:37017" }
	{  "_id" : "g05",  "host" : "g05/10.10.2.111:27017,10.10.2.112:27017" }
	{  "_id" : "g06",  "host" : "g06/10.10.2.111:37017,10.10.2.112:37017" }
  databases:
	{  "_id" : "admin",  "partitioned" : false,  "primary" : "config" }
	{  "_id" : "test",  "partitioned" : false,  "primary" : "g01" }
	{  "_id" : "mxenzai_production",  "partitioned" : true,  "primary" : "g01" }
		mxenzai_production.axion_logs chunks:
				g01	1
			{ "user_id" : { $minKey : 1 }, "checked_at" : { $minKey : 1 } } -->> { "user_id" : { $maxKey : 1 }, "checked_at" : { $maxKey : 1 } } on : g01 { "t" : 1000, "i" : 0 }
		mxenzai_production.external_data chunks:
				g01	19
				g06	19
				g05	20
				g04	20
				g03	20
				g02	24
			too many chunks to print, use verbose if you want to force print
		mxenzai_production.internal_data chunks:
				g02	5
				g04	6
				g03	5
				g05	4
				g06	4
				g01	6
			too many chunks to print, use verbose if you want to force print
		mxenzai_production.trend_averages chunks:
				g06	4
				g03	5
				g01	5
				g05	5
				g04	5
				g02	5
			too many chunks to print, use verbose if you want to force print
		mxenzai_production.trend_failures chunks:
				g04	1
				g03	2
				g05	1
				g06	2
				g01	1
				g02	2
			{ "user_id" : { $minKey : 1 }, "failed_start_at" : { $minKey : 1 } } -->> { "user_id" : 2, "failed_start_at" : 1311921951 } on : g04 { "t" : 5000, "i" : 0 }
			{ "user_id" : 2, "failed_start_at" : 1311921951 } -->> { "user_id" : 9, "failed_start_at" : 1310735983 } on : g03 { "t" : 8000, "i" : 4 }
			{ "user_id" : 9, "failed_start_at" : 1310735983 } -->> { "user_id" : 12, "failed_start_at" : 1314561600 } on : g03 { "t" : 8000, "i" : 5 }
			{ "user_id" : 12, "failed_start_at" : 1314561600 } -->> { "user_id" : 31, "failed_start_at" : 1312181010 } on : g05 { "t" : 6000, "i" : 0 }
			{ "user_id" : 31, "failed_start_at" : 1312181010 } -->> { "user_id" : 38, "failed_start_at" : 1313334000 } on : g06 { "t" : 7000, "i" : 0 }
			{ "user_id" : 38, "failed_start_at" : 1313334000 } -->> { "user_id" : 147, "failed_start_at" : 1314961383 } on : g01 { "t" : 7000, "i" : 1 }
			{ "user_id" : 147, "failed_start_at" : 1314961383 } -->> { "user_id" : 388, "failed_start_at" : 1317254400 } on : g02 { "t" : 8000, "i" : 2 }
			{ "user_id" : 388, "failed_start_at" : 1317254400 } -->> { "user_id" : 759, "failed_start_at" : 1319896800 } on : g02 { "t" : 8000, "i" : 3 }
			{ "user_id" : 759, "failed_start_at" : 1319896800 } -->> { "user_id" : { $maxKey : 1 }, "failed_start_at" : { $maxKey : 1 } } on : g06 { "t" : 8000, "i" : 0 }
	{  "_id" : "mamelog",  "partitioned" : false,  "primary" : "g05" }
	{  "_id" : "mamelog_user_1",  "partitioned" : false,  "primary" : "g06" }
	{  "_id" : "mamelog_user_12",  "partitioned" : false,  "primary" : "g05" }
	{  "_id" : "mamelog_user_7",  "partitioned" : false,  "primary" : "g06" }
	{  "_id" : "mamelog_user_2",  "partitioned" : false,  "primary" : "g05" }
	{  "_id" : "mamelog_user_3",  "partitioned" : false,  "primary" : "g06" }
	{  "_id" : "mamelog_user_4",  "partitioned" : false,  "primary" : "g05" }
	{  "_id" : "mamelog_user_5",  "partitioned" : false,  "primary" : "g06" }
	{  "_id" : "mamelog_user_6",  "partitioned" : false,  "primary" : "g05" }
	{  "_id" : "mamelog_user_8",  "partitioned" : false,  "primary" : "g06" }
	{  "_id" : "mamelog_user_9",  "partitioned" : false,  "primary" : "g05" }
	{  "_id" : "mamelog_user_10",  "partitioned" : false,  "primary" : "g06" }
	{  "_id" : "mamelog_user_11",  "partitioned" : false,  "primary" : "g05" }
 

Please refer to an attached file "db_version.jpg". All mongod versions are 2.0.1.

Comment by Greg Studer [ 06/Nov/11 ]

Hmm... is there anything different about g05 related to the collection mxenzai_production.trend_failures - the version isn't reset in the failed case. There are known issues with migration interfering with mapReduce (which we're working to fix now), the version numbers of the failed run increasing so quickly after the normal run make me think there's a lot of data movement there.

Comment by Yukihiko Sawanobori [ 05/Nov/11 ]

Hi.

I've encountered almost the same thing when using the mapreduce.
I'm able to get logs with logLevel 2 like below.

Case: mapreduce success.

Case: mapreduce success.

Sat Nov  5 21:45:02 [mongosMain] connection accepted from 127.0.0.1:47217 #21
Sat Nov  5 21:45:02 [mongosMain] connection accepted from 127.0.0.1:47218 #22
Sat Nov  5 21:45:02 [conn21] SocketException: remote: 127.0.0.1:47217 error: 9001 socket exception [0] server [127.0.0.1:47217] 
Sat Nov  5 21:45:02 [conn21] end connection 127.0.0.1:47217
Sat Nov  5 21:45:02 [conn22] DBConfig unserialize: mxenzai_production { _id: "mxenzai_production", partitioned: true, primary: "g01" }
Sat Nov  5 21:45:02 [conn22] created new distributed lock for mxenzai_production.axion_logs on 10.10.2.107:27019,10.10.2.109:27019,10.10.2.111:27019 ( loc
k timeout : 900000, ping interval : 30000, process : 0 )
Sat Nov  5 21:45:02 [conn22] ChunkManager: time to load chunks for mxenzai_production.axion_logs: 1ms sequenceNumber: 2 version: 1|0
Sat Nov  5 21:45:02 [conn22] created new distributed lock for mxenzai_production.external_data on 10.10.2.107:27019,10.10.2.109:27019,10.10.2.111:27019 ( 
lock timeout : 900000, ping interval : 30000, process : 0 )
Sat Nov  5 21:45:02 [conn22] ChunkManager: time to load chunks for mxenzai_production.external_data: 4ms sequenceNumber: 3 version: 49|19
Sat Nov  5 21:45:02 [conn22] created new distributed lock for mxenzai_production.internal_data on 10.10.2.107:27019,10.10.2.109:27019,10.10.2.111:27019 ( 
lock timeout : 900000, ping interval : 30000, process : 0 )
Sat Nov  5 21:45:02 [conn22] ChunkManager: time to load chunks for mxenzai_production.internal_data: 1ms sequenceNumber: 4 version: 20|5
Sat Nov  5 21:45:02 [conn22] created new distributed lock for mxenzai_production.trend_averages on 10.10.2.107:27019,10.10.2.109:27019,10.10.2.111:27019 (
 lock timeout : 900000, ping interval : 30000, process : 0 )
Sat Nov  5 21:45:02 [conn22] ChunkManager: time to load chunks for mxenzai_production.trend_averages: 1ms sequenceNumber: 5 version: 19|1
Sat Nov  5 21:45:02 [conn22] created new distributed lock for mxenzai_production.trend_failures on 10.10.2.107:27019,10.10.2.109:27019,10.10.2.111:27019 (
 lock timeout : 900000, ping interval : 30000, process : 0 )
Sat Nov  5 21:45:02 [conn22] ChunkManager: time to load chunks for mxenzai_production.trend_failures: 1ms sequenceNumber: 6 version: 8|5
Sat Nov  5 21:45:02 BackgroundJob starting: ConnectBG
Sat Nov  5 21:45:02 [conn22] creating WriteBackListener for: 10.10.2.107:27017 serverID: 4eb52eb9cbad6ddd72b40d97
Sat Nov  5 21:45:02 [conn22] creating WriteBackListener for: 10.10.2.108:27017 serverID: 4eb52eb9cbad6ddd72b40d97
Sat Nov  5 21:45:02 BackgroundJob starting: WriteBackListener-10.10.2.107:27017
Sat Nov  5 21:45:02 [WriteBackListener-10.10.2.107:27017] creating new connection to:10.10.2.107:27017
Sat Nov  5 21:45:02 BackgroundJob starting: WriteBackListener-10.10.2.108:27017
Sat Nov  5 21:45:02 [WriteBackListener-10.10.2.108:27017] creating new connection to:10.10.2.108:27017
Sat Nov  5 21:45:02 BackgroundJob starting: ConnectBG
Sat Nov  5 21:45:02 BackgroundJob starting: ConnectBG
Sat Nov  5 21:45:02 BackgroundJob starting: ConnectBG
Sat Nov  5 21:45:02 [WriteBackListener-10.10.2.108:27017] connected connection!
Sat Nov  5 21:45:02 [WriteBackListener-10.10.2.107:27017] connected connection!
Sat Nov  5 21:45:02 [conn22] creating WriteBackListener for: 10.10.2.108:37017 serverID: 4eb52eb9cbad6ddd72b40d97
Sat Nov  5 21:45:02 [conn22] creating WriteBackListener for: 10.10.2.107:37017 serverID: 4eb52eb9cbad6ddd72b40d97
Sat Nov  5 21:45:02 BackgroundJob starting: WriteBackListener-10.10.2.108:37017
Sat Nov  5 21:45:02 [WriteBackListener-10.10.2.108:37017] creating new connection to:10.10.2.108:37017
Sat Nov  5 21:45:02 BackgroundJob starting: WriteBackListener-10.10.2.107:37017
Sat Nov  5 21:45:02 [WriteBackListener-10.10.2.107:37017] creating new connection to:10.10.2.107:37017
Sat Nov  5 21:45:02 BackgroundJob starting: ConnectBG                                                                                           [780/1840]
Sat Nov  5 21:45:02 BackgroundJob starting: ConnectBG
Sat Nov  5 21:45:02 BackgroundJob starting: ConnectBG
Sat Nov  5 21:45:02 [WriteBackListener-10.10.2.108:37017] connected connection!
Sat Nov  5 21:45:02 [WriteBackListener-10.10.2.107:37017] connected connection!
Sat Nov  5 21:45:02 [conn22] creating WriteBackListener for: 10.10.2.109:27017 serverID: 4eb52eb9cbad6ddd72b40d97
Sat Nov  5 21:45:02 BackgroundJob starting: WriteBackListener-10.10.2.109:27017
Sat Nov  5 21:45:02 [conn22] creating WriteBackListener for: 10.10.2.110:27017 serverID: 4eb52eb9cbad6ddd72b40d97
Sat Nov  5 21:45:02 [WriteBackListener-10.10.2.109:27017] creating new connection to:10.10.2.109:27017
Sat Nov  5 21:45:02 BackgroundJob starting: WriteBackListener-10.10.2.110:27017
Sat Nov  5 21:45:02 BackgroundJob starting: ConnectBG
Sat Nov  5 21:45:02 [WriteBackListener-10.10.2.110:27017] creating new connection to:10.10.2.110:27017
Sat Nov  5 21:45:02 BackgroundJob starting: ConnectBG
Sat Nov  5 21:45:02 BackgroundJob starting: ConnectBG
Sat Nov  5 21:45:02 [WriteBackListener-10.10.2.109:27017] connected connection!
Sat Nov  5 21:45:02 [WriteBackListener-10.10.2.110:27017] connected connection!
Sat Nov  5 21:45:02 [conn22] creating WriteBackListener for: 10.10.2.110:37017 serverID: 4eb52eb9cbad6ddd72b40d97
Sat Nov  5 21:45:02 [conn22] creating WriteBackListener for: 10.10.2.109:37017 serverID: 4eb52eb9cbad6ddd72b40d97
Sat Nov  5 21:45:02 BackgroundJob starting: WriteBackListener-10.10.2.110:37017
Sat Nov  5 21:45:02 BackgroundJob starting: WriteBackListener-10.10.2.109:37017
Sat Nov  5 21:45:02 [WriteBackListener-10.10.2.110:37017] creating new connection to:10.10.2.110:37017
Sat Nov  5 21:45:02 BackgroundJob starting: ConnectBG
Sat Nov  5 21:45:02 [WriteBackListener-10.10.2.109:37017] creating new connection to:10.10.2.109:37017
Sat Nov  5 21:45:02 BackgroundJob starting: ConnectBG
Sat Nov  5 21:45:02 BackgroundJob starting: ConnectBG
Sat Nov  5 21:45:02 [WriteBackListener-10.10.2.110:37017] connected connection!
Sat Nov  5 21:45:02 [WriteBackListener-10.10.2.109:37017] connected connection!
Sat Nov  5 21:45:02 [conn22] creating WriteBackListener for: 10.10.2.111:27017 serverID: 4eb52eb9cbad6ddd72b40d97
Sat Nov  5 21:45:02 [conn22] creating WriteBackListener for: 10.10.2.112:27017 serverID: 4eb52eb9cbad6ddd72b40d97
Sat Nov  5 21:45:02 BackgroundJob starting: WriteBackListener-10.10.2.111:27017
Sat Nov  5 21:45:02 BackgroundJob starting: WriteBackListener-10.10.2.112:27017
Sat Nov  5 21:45:02 [WriteBackListener-10.10.2.111:27017] creating new connection to:10.10.2.111:27017
Sat Nov  5 21:45:02 BackgroundJob starting: ConnectBG
Sat Nov  5 21:45:02 [WriteBackListener-10.10.2.112:27017] creating new connection to:10.10.2.112:27017
Sat Nov  5 21:45:02 BackgroundJob starting: ConnectBG
Sat Nov  5 21:45:02 BackgroundJob starting: ConnectBG
Sat Nov  5 21:45:02 [WriteBackListener-10.10.2.111:27017] connected connection!
Sat Nov  5 21:45:02 [WriteBackListener-10.10.2.112:27017] connected connection!
Sat Nov  5 21:45:02 [conn22] creating WriteBackListener for: 10.10.2.111:37017 serverID: 4eb52eb9cbad6ddd72b40d97
Sat Nov  5 21:45:02 [conn22] creating WriteBackListener for: 10.10.2.112:37017 serverID: 4eb52eb9cbad6ddd72b40d97
Sat Nov  5 21:45:02 BackgroundJob starting: WriteBackListener-10.10.2.112:37017                                                                 [738/1840]
Sat Nov  5 21:45:02 [WriteBackListener-10.10.2.112:37017] creating new connection to:10.10.2.112:37017
Sat Nov  5 21:45:02 BackgroundJob starting: ConnectBG
Sat Nov  5 21:45:02 BackgroundJob starting: ConnectBG
Sat Nov  5 21:45:02 [WriteBackListener-10.10.2.112:37017] connected connection!
Sat Nov  5 21:45:02 [WriteBackListener-10.10.2.111:37017] connected connection!
Sat Nov  5 21:45:02 [conn22]  have to set shard version for conn: 0x7f875801d520 ns:mxenzai_production.trend_averages my last seq: 0  current: 5 version: 
17|0 manager: 0x7f8758023b20
Sat Nov  5 21:45:02 [conn22]     setShardVersion  g01 10.10.2.108:27017  mxenzai_production.trend_averages  { setShardVersion: "mxenzai_production.trend_a
verages", configdb: "10.10.2.107:27019,10.10.2.109:27019,10.10.2.111:27019", version: Timestamp 17000|0, serverID: ObjectId('4eb52eb9cbad6ddd72b40d97'), s
hard: "g01", shardHost: "g01/10.10.2.107:27017,10.10.2.108:27017" } 0x7f875801d520
Sat Nov  5 21:45:02 [conn22]       setShardVersion success: { oldVersion: Timestamp 0|0, ok: 1.0 }
Sat Nov  5 21:45:02 [conn22]  have to set shard version for conn: 0x7f875803c600 ns:mxenzai_production.trend_averages my last seq: 0  current: 5 version: 
19|1 manager: 0x7f8758023b20
Sat Nov  5 21:45:02 [conn22]     setShardVersion  g02 10.10.2.108:37017  mxenzai_production.trend_averages  { setShardVersion: "mxenzai_production.trend_a
verages", configdb: "10.10.2.107:27019,10.10.2.109:27019,10.10.2.111:27019", version: Timestamp 19000|1, serverID: ObjectId('4eb52eb9cbad6ddd72b40d97'), s
hard: "g02", shardHost: "g02/10.10.2.108:37017,10.10.2.107:37017" } 0x7f875803c600
Sat Nov  5 21:45:02 [conn22]       setShardVersion success: { oldVersion: Timestamp 0|0, ok: 1.0 }
Sat Nov  5 21:45:02 [conn22]  have to set shard version for conn: 0x7f875803e2b0 ns:mxenzai_production.trend_averages my last seq: 0  current: 5 version: 
18|0 manager: 0x7f8758023b20
Sat Nov  5 21:45:02 [conn22]     setShardVersion  g03 10.10.2.109:27017  mxenzai_production.trend_averages  { setShardVersion: "mxenzai_production.trend_a
verages", configdb: "10.10.2.107:27019,10.10.2.109:27019,10.10.2.111:27019", version: Timestamp 18000|0, serverID: ObjectId('4eb52eb9cbad6ddd72b40d97'), s
hard: "g03", shardHost: "g03/10.10.2.109:27017,10.10.2.110:27017" } 0x7f875803e2b0
Sat Nov  5 21:45:02 [conn22]       setShardVersion success: { oldVersion: Timestamp 0|0, ok: 1.0 }
Sat Nov  5 21:45:02 [conn22]  have to set shard version for conn: 0x7f87580405e0 ns:mxenzai_production.trend_averages my last seq: 0  current: 5 version: 
17|3 manager: 0x7f8758023b20
Sat Nov  5 21:45:02 [conn22]     setShardVersion  g04 10.10.2.110:37017  mxenzai_production.trend_averages  { setShardVersion: "mxenzai_production.trend_a
verages", configdb: "10.10.2.107:27019,10.10.2.109:27019,10.10.2.111:27019", version: Timestamp 17000|3, serverID: ObjectId('4eb52eb9cbad6ddd72b40d97'), s
hard: "g04", shardHost: "g04/10.10.2.110:37017,10.10.2.109:37017" } 0x7f87580405e0
Sat Nov  5 21:45:02 [conn22]       setShardVersion success: { oldVersion: Timestamp 0|0, ok: 1.0 }
Sat Nov  5 21:45:02 [conn22]  have to set shard version for conn: 0x7f8758042560 ns:mxenzai_production.trend_averages my last seq: 0  current: 5 version: 
19|0 manager: 0x7f8758023b20
Sat Nov  5 21:45:02 [conn22]     setShardVersion  g05 10.10.2.111:27017  mxenzai_production.trend_averages  { setShardVersion: "mxenzai_production.trend_a
verages", configdb: "10.10.2.107:27019,10.10.2.109:27019,10.10.2.111:27019", version: Timestamp 19000|0, serverID: ObjectId('4eb52eb9cbad6ddd72b40d97'), s
hard: "g05", shardHost: "g05/10.10.2.111:27017,10.10.2.112:27017" } 0x7f8758042560
Sat Nov  5 21:45:02 [conn22]       setShardVersion success: { oldVersion: Timestamp 0|0, ok: 1.0 }
Sat Nov  5 21:45:02 [conn22]  have to set shard version for conn: 0x7f874c001680 ns:mxenzai_production.trend_averages my last seq: 0  current: 5 version: 
16|0 manager: 0x7f8758023b20
Sat Nov  5 21:45:02 [conn22]     setShardVersion  g06 10.10.2.112:37017  mxenzai_production.trend_averages  { setShardVersion: "mxenzai_production.trend_a
verages", configdb: "10.10.2.107:27019,10.10.2.109:27019,10.10.2.111:27019", version: Timestamp 16000|0, serverID: ObjectId('4eb52eb9cbad6ddd72b40d97'), s
hard: "g06", shardHost: "g06/10.10.2.111:37017,10.10.2.112:37017" } 0x7f874c001680
Sat Nov  5 21:45:02 [conn22]       setShardVersion success: { oldVersion: Timestamp 0|0, ok: 1.0 }
Sat Nov  5 21:45:02 [conn22] delete : { checked_at: { $gte: 1320487200, $lte: 1320490799 }, interval: 600 }      6 justOne: 0
Sat Nov  5 21:45:02 [conn22]  have to set shard version for conn: 0x7f875801d520 ns:mxenzai_production.external_data my last seq: 0  current: 3 version: 4
7|0 manager: 0x7f875801e520
Sat Nov  5 21:45:02 [conn22]     setShardVersion  g01 10.10.2.108:27017  mxenzai_production.external_data  { setShardVersion: "mxenzai_production.external
_data", configdb: "10.10.2.107:27019,10.10.2.109:27019,10.10.2.111:27019", version: Timestamp 47000|0, serverID: ObjectId('4eb52eb9cbad6ddd72b40d97'), sha
rd: "g01", shardHost: "g01/10.10.2.107:27017,10.10.2.108:27017" } 0x7f875801d520
Sat Nov  5 21:45:02 [Balancer] about to acquire distributed lock 'balancer/app02:27017:1320496826:1804289383:
  "when" : { "$date" : "Sat Nov  5 21:45:02 2011" },
Sat Nov  5 21:45:02 [conn22]       setShardVersion success: { oldVersion: Timestamp 0|0, ok: 1.0 }
Sat Nov  5 21:45:02 [conn22]  have to set shard version for conn: 0x7f875803c600 ns:mxenzai_production.external_data my last seq: 0  current: 3 version: 4
9|17 manager: 0x7f875801e520
Sat Nov  5 21:45:02 [conn22]     setShardVersion  g02 10.10.2.108:37017  mxenzai_production.external_data  { setShardVersion: "mxenzai_production.external
_data", configdb: "10.10.2.107:27019,10.10.2.109:27019,10.10.2.111:27019", version: Timestamp 49000|17, serverID: ObjectId('4eb52eb9cbad6ddd72b40d97'), sh
ard: "g02", shardHost: "g02/10.10.2.108:37017,10.10.2.107:37017" } 0x7f875803c600
Sat Nov  5 21:45:02 [conn22]       setShardVersion success: { oldVersion: Timestamp 0|0, ok: 1.0 }
Sat Nov  5 21:45:02 [conn22]  have to set shard version for conn: 0x7f875803e2b0 ns:mxenzai_production.external_data my last seq: 0  current: 3 version: 4
9|19 manager: 0x7f875801e520
Sat Nov  5 21:45:02 [conn22]     setShardVersion  g03 10.10.2.109:27017  mxenzai_production.external_data  { setShardVersion: "mxenzai_production.external
_data", configdb: "10.10.2.107:27019,10.10.2.109:27019,10.10.2.111:27019", version: Timestamp 49000|19, serverID: ObjectId('4eb52eb9cbad6ddd72b40d97'), sh
ard: "g03", shardHost: "g03/10.10.2.109:27017,10.10.2.110:27017" } 0x7f875803e2b0
Sat Nov  5 21:45:02 [conn22]       setShardVersion success: { oldVersion: Timestamp 0|0, ok: 1.0 }
Sat Nov  5 21:45:02 [conn22]  have to set shard version for conn: 0x7f87580405e0 ns:mxenzai_production.external_data my last seq: 0  current: 3 version: 4
9|5 manager: 0x7f875801e520
Sat Nov  5 21:45:02 [conn22]     setShardVersion  g04 10.10.2.110:37017  mxenzai_production.external_data  { setShardVersion: "mxenzai_production.external
_data", configdb: "10.10.2.107:27019,10.10.2.109:27019,10.10.2.111:27019", version: Timestamp 49000|5, serverID: ObjectId('4eb52eb9cbad6ddd72b40d97'), sha
rd: "g04", shardHost: "g04/10.10.2.110:37017,10.10.2.109:37017" } 0x7f87580405e0
Sat Nov  5 21:45:02 [conn22]       setShardVersion success: { oldVersion: Timestamp 0|0, ok: 1.0 }
Sat Nov  5 21:45:02 [conn22]  have to set shard version for conn: 0x7f8758042560 ns:mxenzai_production.external_data my last seq: 0  current: 3 version: 4
9|11 manager: 0x7f875801e520
Sat Nov  5 21:45:02 [conn22]     setShardVersion  g05 10.10.2.111:27017  mxenzai_production.external_data  { setShardVersion: "mxenzai_production.external
_data", configdb: "10.10.2.107:27019,10.10.2.109:27019,10.10.2.111:27019", version: Timestamp 49000|11, serverID: ObjectId('4eb52eb9cbad6ddd72b40d97'), sh
ard: "g05", shardHost: "g05/10.10.2.111:27017,10.10.2.112:27017" } 0x7f8758042560
Sat Nov  5 21:45:02 [conn22]       setShardVersion success: { oldVersion: Timestamp 0|0, ok: 1.0 }
Sat Nov  5 21:45:02 [conn22]  have to set shard version for conn: 0x7f874c001680 ns:mxenzai_production.external_data my last seq: 0  current: 3 version: 4
9|9 manager: 0x7f875801e520
Sat Nov  5 21:45:02 [conn22]     setShardVersion  g06 10.10.2.112:37017  mxenzai_production.external_data  { setShardVersion: "mxenzai_production.external
_data", configdb: "10.10.2.107:27019,10.10.2.109:27019,10.10.2.111:27019", version: Timestamp 49000|9, serverID: ObjectId('4eb52eb9cbad6ddd72b40d97'), sha
rd: "g06", shardHost: "g06/10.10.2.111:37017,10.10.2.112:37017" } 0x7f874c001680
Sat Nov  5 21:45:02 [conn22]       setShardVersion success: { oldVersion: Timestamp 0|0, ok: 1.0 }
Sat Nov  5 21:45:02 [Balancer] distributed lock 'balancer/app02:27017:1320496826:1804289383' acquired, ts : 4eb52fcecbad6ddd72b40db3
Sat Nov  5 21:45:02 [Balancer] *** start balancing round
Sat Nov  5 21:45:02 [Balancer] collection : mxenzai_production.external_data
Sat Nov  5 21:45:02 [Balancer] donor      : 23 chunks on g02
Sat Nov  5 21:45:02 [Balancer] receiver   : 19 chunks on g01
Sat Nov  5 21:45:02 [Balancer] collection : mxenzai_production.internal_data
Sat Nov  5 21:45:02 [Balancer] donor      : 6 chunks on g01
Sat Nov  5 21:45:02 [Balancer] receiver   : 4 chunks on g05
Sat Nov  5 21:45:02 [Balancer] collection : mxenzai_production.trend_averages
Sat Nov  5 21:45:02 [Balancer] donor      : 5 chunks on g01
Sat Nov  5 21:45:02 [Balancer] receiver   : 4 chunks on g06
Sat Nov  5 21:45:02 [Balancer] collection : mxenzai_production.trend_failures
Sat Nov  5 21:45:02 [Balancer] donor      : 2 chunks on g02
Sat Nov  5 21:45:02 [Balancer] receiver   : 1 chunks on g01
Sat Nov  5 21:45:02 [Balancer] collection : mxenzai_production.axion_logs
Sat Nov  5 21:45:02 [Balancer] donor      : 1 chunks on g01
Sat Nov  5 21:45:02 [Balancer] receiver   : 0 chunks on g02
Sat Nov  5 21:45:02 [Balancer] no need to move any chunk
Sat Nov  5 21:45:02 [Balancer] *** end of balancing round
Sat Nov  5 21:45:03 [Balancer] distributed lock 'balancer/app02:27017:1320496826:1804289383' unlocked. 
Sat Nov  5 21:45:03 [conn22] customOut: { merge: "trend_average_tmp" } outServer: g01:g01/10.10.2.107:27017,10.10.2.108:27017
Sat Nov  5 21:45:03 [conn22]  have to set shard version for conn: 0x7f875801d520 ns:mxenzai_production.internal_data my last seq: 0  current: 4 version: 2
0|5 manager: 0x7f8758021610
Sat Nov  5 21:45:03 [conn22]     setShardVersion  g01 10.10.2.108:27017  mxenzai_production.internal_data  { setShardVersion: "mxenzai_production.internal
_data", configdb: "10.10.2.107:27019,10.10.2.109:27019,10.10.2.111:27019", version: Timestamp 20000|5, serverID: ObjectId('4eb52eb9cbad6ddd72b40d97'), sha
rd: "g01", shardHost: "g01/10.10.2.107:27017,10.10.2.108:27017" } 0x7f875801d520
Sat Nov  5 21:45:03 [conn22]       setShardVersion success: { oldVersion: Timestamp 0|0, ok: 1.0 }
Sat Nov  5 21:45:03 [conn22]  have to set shard version for conn: 0x7f875803c600 ns:mxenzai_production.internal_data my last seq: 0  current: 4 version: 1
8|5 manager: 0x7f8758021610
Sat Nov  5 21:45:03 [conn22]     setShardVersion  g02 10.10.2.108:37017  mxenzai_production.internal_data  { setShardVersion: "mxenzai_production.internal
_data", configdb: "10.10.2.107:27019,10.10.2.109:27019,10.10.2.111:27019", version: Timestamp 18000|5, serverID: ObjectId('4eb52eb9cbad6ddd72b40d97'), sha
rd: "g02", shardHost: "g02/10.10.2.108:37017,10.10.2.107:37017" } 0x7f875803c600
Sat Nov  5 21:45:03 [conn22]       setShardVersion success: { oldVersion: Timestamp 0|0, ok: 1.0 }
Sat Nov  5 21:45:03 [conn22]  have to set shard version for conn: 0x7f875803e2b0 ns:mxenzai_production.internal_data my last seq: 0  current: 4 version: 2
0|1 manager: 0x7f8758021610
Sat Nov  5 21:45:03 [conn22]     setShardVersion  g03 10.10.2.109:27017  mxenzai_production.internal_data  { setShardVersion: "mxenzai_production.internal
_data", configdb: "10.10.2.107:27019,10.10.2.109:27019,10.10.2.111:27019", version: Timestamp 20000|1, serverID: ObjectId('4eb52eb9cbad6ddd72b40d97'), sha
rd: "g03", shardHost: "g03/10.10.2.109:27017,10.10.2.110:27017" } 0x7f875803e2b0
Sat Nov  5 21:45:03 [conn22]       setShardVersion success: { oldVersion: Timestamp 0|0, ok: 1.0 }
Sat Nov  5 21:45:03 [conn22]  have to set shard version for conn: 0x7f87580405e0 ns:mxenzai_production.internal_data my last seq: 0  current: 4 version: 2
0|3 manager: 0x7f8758021610
Sat Nov  5 21:45:03 [conn22]     setShardVersion  g04 10.10.2.110:37017  mxenzai_production.internal_data  { setShardVersion: "mxenzai_production.internal
_data", configdb: "10.10.2.107:27019,10.10.2.109:27019,10.10.2.111:27019", version: Timestamp 20000|3, serverID: ObjectId('4eb52eb9cbad6ddd72b40d97'), sha
rd: "g04", shardHost: "g04/10.10.2.110:37017,10.10.2.109:37017" } 0x7f87580405e0
Sat Nov  5 21:45:03 [conn22]       setShardVersion success: { oldVersion: Timestamp 0|0, ok: 1.0 }
Sat Nov  5 21:45:03 [conn22]  have to set shard version for conn: 0x7f8758042560 ns:mxenzai_production.internal_data my last seq: 0  current: 4 version: 1
7|0 manager: 0x7f8758021610
Sat Nov  5 21:45:03 [conn22]     setShardVersion  g05 10.10.2.111:27017  mxenzai_production.internal_data  { setShardVersion: "mxenzai_production.internal
_data", configdb: "10.10.2.107:27019,10.10.2.109:27019,10.10.2.111:27019", version: Timestamp 17000|0, serverID: ObjectId('4eb52eb9cbad6ddd72b40d97'), sha
rd: "g05", shardHost: "g05/10.10.2.111:27017,10.10.2.112:27017" } 0x7f8758042560
Sat Nov  5 21:45:03 [conn22]       setShardVersion success: { oldVersion: Timestamp 0|0, ok: 1.0 }
Sat Nov  5 21:45:03 [conn22]  have to set shard version for conn: 0x7f874c001680 ns:mxenzai_production.internal_data my last seq: 0  current: 4 version: 1
8|0 manager: 0x7f8758021610
Sat Nov  5 21:45:03 [conn22]     setShardVersion  g06 10.10.2.112:37017  mxenzai_production.internal_data  { setShardVersion: "mxenzai_production.internal
_data", configdb: "10.10.2.107:27019,10.10.2.109:27019,10.10.2.111:27019", version: Timestamp 18000|0, serverID: ObjectId('4eb52eb9cbad6ddd72b40d97'), sha
rd: "g06", shardHost: "g06/10.10.2.111:37017,10.10.2.112:37017" } 0x7f874c001680
Sat Nov  5 21:45:03 [conn22]       setShardVersion success: { oldVersion: Timestamp 0|0, ok: 1.0 }
Sat Nov  5 21:45:03 [conn22] customOut: { merge: "trend_average_tmp" } outServer: g01:g01/10.10.2.107:27017,10.10.2.108:27017
Sat Nov  5 21:45:04 [conn22] creating new connection to:10.10.2.108:27017
Sat Nov  5 21:45:04 BackgroundJob starting: ConnectBG
Sat Nov  5 21:45:04 [conn22] connected connection!
Sat Nov  5 21:45:04 [conn22] about to initiate autosplit: ns:mxenzai_production.trend_averages at: g02:g02/10.10.2.108:37017,10.10.2.107:37017 lastmod: 15
|3 min: { user_id: 597.0, checked_at: 1318362600.0 } max: { user_id: MaxKey, checked_at: MaxKey } dataWritten: 13408389 splitThreshold: 60397977
Sat Nov  5 21:45:04 [conn22] chunk not full enough to trigger auto-split
Sat Nov  5 21:45:05 [conn22] delete : { checked_at: { $gte: 1320487200, $lte: 1320490799 }, interval: 1800 }     6 justOne: 0
Sat Nov  5 21:45:05 [conn22] customOut: { merge: "trend_average_tmp" } outServer: g01:g01/10.10.2.107:27017,10.10.2.108:27017
Sat Nov  5 21:45:05 [conn22] SocketException: remote: 127.0.0.1:47218 error: 9001 socket exception [0] server [127.0.0.1:47218] 
Sat Nov  5 21:45:05 [conn22] end connection 127.0.0.1:47218

Case: mapreduce fail.

Case: mapreduce fail.

Sat Nov  5 20:30:02 [mongosMain] connection accepted from 127.0.0.1:55504 #125
Sat Nov  5 20:30:02 [mongosMain] connection accepted from 127.0.0.1:55505 #126
Sat Nov  5 20:30:02 [conn125] SocketException: remote: 127.0.0.1:55504 error: 9001 socket exception [0] server [127.0.0.1:55504] 
Sat Nov  5 20:30:02 [conn125] end connection 127.0.0.1:55504
Sat Nov  5 20:30:02 BackgroundJob starting: ConnectBG
Sat Nov  5 20:30:02 BackgroundJob starting: ConnectBG
Sat Nov  5 20:30:02 BackgroundJob starting: ConnectBG
Sat Nov  5 20:30:02 BackgroundJob starting: ConnectBG
Sat Nov  5 20:30:02 BackgroundJob starting: ConnectBG
Sat Nov  5 20:30:02 BackgroundJob starting: ConnectBG
Sat Nov  5 20:30:03 [conn126]  have to set shard version for conn: 0x7f821801c310 ns:mxenzai_production.trend_failures my last seq: 0  current: 6 version:
 7|1 manager: 0x29736c0
Sat Nov  5 20:30:03 [conn126]     setShardVersion  g01 10.10.2.108:27017  mxenzai_production.trend_failures  { setShardVersion: "mxenzai_production.trend_
failures", configdb: "10.10.2.107:27019,10.10.2.109:27019,10.10.2.111:27019", version: Timestamp 7000|1, serverID: ObjectId('4eb504273c1e926abdce03d6'), s
hard: "g01", shardHost: "g01/10.10.2.107:27017,10.10.2.108:27017" } 0x7f821801c310
Sat Nov  5 20:30:03 [conn126]       setShardVersion success: { oldVersion: Timestamp 0|0, ok: 1.0 }
Sat Nov  5 20:30:03 [conn126]  have to set shard version for conn: 0x7f821801bad0 ns:mxenzai_production.trend_failures my last seq: 0  current: 6 version:
 8|3 manager: 0x29736c0
Sat Nov  5 20:30:03 [conn126]     setShardVersion  g02 10.10.2.108:37017  mxenzai_production.trend_failures  { setShardVersion: "mxenzai_production.trend_
failures", configdb: "10.10.2.107:27019,10.10.2.109:27019,10.10.2.111:27019", version: Timestamp 8000|3, serverID: ObjectId('4eb504273c1e926abdce03d6'), s
hard: "g02", shardHost: "g02/10.10.2.108:37017,10.10.2.107:37017" } 0x7f821801bad0
Sat Nov  5 20:30:03 [conn126]       setShardVersion success: { oldVersion: Timestamp 0|0, ok: 1.0 }
Sat Nov  5 20:30:03 [conn126]  have to set shard version for conn: 0x7f8218023cd0 ns:mxenzai_production.trend_failures my last seq: 0  current: 6 version:
 8|5 manager: 0x29736c0
Sat Nov  5 20:30:03 [conn126]     setShardVersion  g03 10.10.2.109:27017  mxenzai_production.trend_failures  { setShardVersion: "mxenzai_production.trend_
failures", configdb: "10.10.2.107:27019,10.10.2.109:27019,10.10.2.111:27019", version: Timestamp 8000|5, serverID: ObjectId('4eb504273c1e926abdce03d6'), s
hard: "g03", shardHost: "g03/10.10.2.109:27017,10.10.2.110:27017" } 0x7f8218023cd0
Sat Nov  5 20:30:03 [conn126]       setShardVersion success: { oldVersion: Timestamp 0|0, ok: 1.0 }
Sat Nov  5 20:30:03 [conn126]  have to set shard version for conn: 0x7f821801b290 ns:mxenzai_production.trend_failures my last seq: 0  current: 6 version:
 5|0 manager: 0x29736c0
Sat Nov  5 20:30:03 [conn126]     setShardVersion  g04 10.10.2.110:37017  mxenzai_production.trend_failures  { setShardVersion: "mxenzai_production.trend_failures", configdb: "10.10.2.107:27019,10.10.2.109:27019,10.10.2.111:27019", version: Timestamp 5000|0, serverID: ObjectId('4eb504273c1e926abdce03d6'), shard: "g04", shardHost: "g04/10.10.2.110:37017,10.10.2.109:37017" } 0x7f821801b290
Sat Nov  5 20:30:03 [conn126]       setShardVersion success: { oldVersion: Timestamp 0|0, ok: 1.0 }
Sat Nov  5 20:30:03 [conn126]  have to set shard version for conn: 0x7f82180273e0 ns:mxenzai_production.trend_failures my last seq: 0  current: 6 version: 8|0 manager: 0x29736c0
Sat Nov  5 20:30:03 [conn126]     setShardVersion  g06 10.10.2.112:37017  mxenzai_production.trend_failures  { setShardVersion: "mxenzai_production.trend_failures", configdb: "10.10.2.107:27019,10.10.2.109:27019,10.10.2.111:27019", version: Timestamp 8000|0, serverID: ObjectId('4eb504273c1e926abdce03d6'), shard: "g06", shardHost: "g06/10.10.2.111:37017,10.10.2.112:37017" } 0x7f82180273e0
Sat Nov  5 20:30:03 [conn126]       setShardVersion success: { oldVersion: Timestamp 0|0, ok: 1.0 }
 
-->> No more logs about "conn126"

Comment by Greg Studer [ 04/Nov/11 ]

Also, do you have more log information you're able to post from the start of this issue? If you're able to change your logLevel to 2 on a mongos and reproduce (can do at runtime via setParameter), that would also be very useful.

Comment by Greg Studer [ 04/Nov/11 ]

Hmm... is the mapreduce output collection the collection having the issues here?

Comment by Zeph Wang [ 02/Nov/11 ]

FYI: I believe this is caused by our mapreduce job (I'll add more information about the driver we used).
A way to stop mongod to generate this type of warning is to do a db.adminCommand("flushRouterConfig") for all mongos.

In a Google groups thread I mentioned the 'command' column value in mongostat is huge, that value actually indicated
a huge number of this type of log are being generated. The above command also make the 'command' value drop significantly
(from tens of thousands to dozens).

Comment by Zeph Wang [ 01/Nov/11 ]

It turned out my coworker also ran a mapreduce job and yield output to a collection in our mongodb cluster.
Maybe that's the root cause.

Comment by Zeph Wang [ 01/Nov/11 ]

Eliot, a little update here:
I manually use "db.stepDown()" to change one shard's primary, after running this command
a couple of times for the same shard, I started to seeing this type of error message AGAIN.

I can assure you all binaries are 2.0.1.

Comment by Zeph Wang [ 01/Nov/11 ]

Eliot, I'm not 100% sure because I manually upgraded 'each' shard and their replicas to 2.0.1. There
is a possibility that I missed a replica. As a matter of fact, I did a batch upgrade of the entire
cluster to 2.0.1 (which makes sure all shards are using 2.0.1) then this error somehow is gone.

Are you implying a mix usage of different mongod/mongos versions will likely cause this type of error?
Thank you.

Comment by Eliot Horowitz (Inactive) [ 01/Nov/11 ]

Are you sure all mongos are 2.0.1?

Generated at Thu Feb 08 03:05:13 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.