|
The first occurrence was just after too many open files. Logs before file descriptors ended looks safe, the only occurrence of Name or service not known was a day before, and there is no suspicious logs before, just stale config, split chunks and start/end connections:
Mon Oct 22 10:47:44 [conn89] warning: splitChunk failed - cmd: { splitChunk: "movie.entity_attr", keyPattern:
{ _id: 1.0 }
, min:
{ _id: "http://vidstream.ru/movie/524d5335928992.html&domain-id=286172" }
, max:
{ _id: "http://vidvidoo.com/movie/1433822-the-apparition-2012-online-streaming-in-hd-aug-24-2012/&domain-id=295708" }
, from: "mongodb-sh1/host01d.load.net:27017,host01f.load.net:27017,host01g.load.net:27017", splitKeys: [
{ _id: "http://vidstream.ru/movie/a0ce51289072c.html&domain-id=286172" }
], shardId: "movie.entity_attr-id"http://vidstream.ru/movie/524d5335928992.html&domain-id=286172"", configdb: "conf01g.load.net:27018,conf01d.load.net:27018,conf01f.load.net:27018" } result: { who: { _id: "movie.entity_attr", process: "host01d:27017:1350602021:1390778741", state: 2, ts: ObjectId('5084ec0ce23b10d5d497b8a4'), when: new Date(1350888460538), who: "host01d:27017:1350602021:
1390778741:conn21057:1425296225", why: "split-
{ _id: "http://vidstream.ru/movie/524d5335928992.html&domain-id=286172" }
" }, errmsg: "the collection's metadata lock is taken", ok: 0.0 }
Mon Oct 22 10:47:44 [conn89] ChunkManager: time to load chunks for movie.entity_attr: 2ms sequenceNumber: 85 version: 464|7
Mon Oct 22 10:47:44 [conn412] autosplitted movie.entity_attr shard: ns:movie.entity_attr at: mongodb-sh1:mongodb-sh1/host01d.load.net:27017,host01f.load.net:27017,host01g.load.net:27017 lastmod: 395|66 min:
{ _id: "http://vidstream.ru/movie/524d5335928992.html&domain-id=286172" }
max:
{ _id: "http://vidvidoo.com/movie/1433822-the-apparit
ion-2012-online-streaming-in-hd-aug-24-2012/&domain-id=295708" }
on:
{ _id: "http://vidstream.ru/movie/a0cf23433ae62.html&domain-id=286172" }
(splitThreshold 67108864)
Mon Oct 22 10:47:49 [conn412] SyncClusterConnection connecting to [conf01g.load.net:27018]
Mon Oct 22 10:47:49 [conn412] SyncClusterConnection connecting to [conf01d.load.net:27018]
Mon Oct 22 10:47:49 [conn412] SyncClusterConnection connecting to [conf01f.load.net:27018]
Mon Oct 22 10:47:49 [conn412] getaddrinfo("conf01f.load.net") failed: Name or service not known
Mon Oct 22 10:47:49 [conn412] SyncClusterConnection connect fail to: conf01f.load.net:27018 errmsg:
Mon Oct 22 10:47:49 [conn412] trying reconnect to conf01f.load.net:27018
>>>Mon Oct 22 10:47:49 [conn412] getaddrinfo("conf01f.load.net") failed: Name or service not known
Mon Oct 22 10:47:49 [conn412] reconnect conf01f.load.net:27018 failed
Mon Oct 22 10:47:49 [conn412] DBException in process: socket exception
Mon Oct 22 10:47:49 [conn412] end connection 111.222.111.140:41915
Mon Oct 22 10:47:49 [conn115] end connection 111.222.111.140:40533
Mon Oct 22 10:47:49 [conn66] end connection 111.222.111.140:54677
Mon Oct 22 10:47:49 [conn112] end connection 111.222.111.140:58591
Mon Oct 22 10:47:49 [conn144] end connection 111.222.111.140:54549
Mon Oct 22 10:47:49 [conn130] end connection 111.222.111.140:40788
Mon Oct 22 10:47:49 [conn425] end connection 111.222.111.140:43930
Mon Oct 22 10:47:49 [conn405] end connection 111.222.111.140:40721
Mon Oct 22 10:47:49 [conn72] end connection 111.222.111.140:55364
.
. nothing suspicious
.
Tue Oct 23 03:11:39 [mongosMain] ERROR: Out of file descriptors. Waiting one second before trying to accept more connections.
.
. Out of file descriptors
.
Tue Oct 23 03:14:43 [mongosMain] ERROR: Out of file descriptors. Waiting one second before trying to accept more connections.
Tue Oct 23 03:14:43 [conn1385] end connection 111.222.111.191:59069
Tue Oct 23 03:14:44 [mongosMain] connection accepted from 111.222.111.191:59070 #1386
Tue Oct 23 03:14:44 [mongosMain] Listener: accept() returns -1 errno:24 Too many open files
Tue Oct 23 03:14:44 [mongosMain] ERROR: Out of file descriptors. Waiting one second before trying to accept more connections.
Tue Oct 23 03:14:44 [conn1386] end connection 111.222.111.191:59070
Tue Oct 23 03:14:45 [mongosMain] connection accepted from 111.222.111.65:53230 #1387
Tue Oct 23 03:16:25 [LockPinger] cluster conf01g.load.net:27018,conf01d.load.net:27018,conf01f.load.net:27018 pinged successfully at Tue Oct 23 03:16:25 2012 by distributed lock pinger 'conf01g.load.net:27018,conf01d.load.net:27018,conf01f.load.net:27018/host01g:27000:1350671783:1804289383', sleeping for 30000ms
>>>Tue Oct 23 03:18:20 [Balancer] getaddrinfo("host01d.load.net") failed: Name or service not known
Tue Oct 23 03:18:20 [Balancer] scoped connection to conf01g.load.net:27018,conf01d.load.net:27018,conf01f.load.net:27018 not being returned to the pool
Tue Oct 23 03:18:20 [Balancer] caught exception while doing balance: socket exception
Tue Oct 23 03:21:28 [conn1182] ns: movie.entity_attr could not initialize cursor across all shards because : stale config detected for ns: movie.entity_attr ParallelCursor::_init @ mongodb-sh1/host01d.load.net:27017,host01f.load.net:27017,host01g.load.net:27017 attempt: 0
Tue Oct 23 03:21:34 [LockPinger] cluster conf01g.load.net:27018,conf01d.load.net:27018,conf01f.load.net:27018 pinged successfully at Tue Oct 23 03:21:33 2012 by distributed lock pinger 'conf01g.load.net:27018,conf01d.load.net:27018,conf01f.load.net:27018/host01g:27000:1350671783:1804289383', sleeping for 30000ms
Tue Oct 23 03:26:38 [LockPinger] cluster conf01g.load.net:27018,conf01d.load.net:27018,conf01f.load.net:27018 pinged successfully at Tue Oct 23 03:26:38 2012 by distributed lock pinger 'conf01g.load.net:27018,conf01d.load.net:27018,conf01f.load.net:27018/host01g:27000:1350671783:1804289383', sleeping for 30000ms
Tue Oct 23 03:29:02 [conn153] ChunkManager: time to load chunks for movie.entity_attr: 3ms sequenceNumber: 101 version: 464|19
Tue Oct 23 03:29:02 [conn153] autosplitted movie.entity_attr shard: ns:movie.entity_attr at: mongodb-sh1:mongodb-sh1/host01d.load.net:27017,host01f.load.net:27017,host01g.load.net:27017 lastmod: 395|28 min:
{ _id: "http://tvfru.ru/load/fantasticheskie_filmi_smotret_online_skachat_torrent/smertelnaja_bitva_nasledie_smotret_online_v_kachestve_mortal_kombat_legacy/6..." }
max:
{ _id: "http://twomovies.name/watch_movie/Four_Sheets_to_the_Wind&domain-id=1973122" }
on:
{ _id: "http://tvlistings.zap2it.com/tv/me-quiero-casar/MV001245240000&domain-id=3945654" }
(splitThreshold 67108864)
.
. stale config
.
Tue Oct 23 03:31:07 [conn406] ns: movie.entity_attr could not initialize cursor across all shards because : stale config detected for ns: movie.entity_attr ParallelCursor::_init @ mongodb-sh1/host01d.load.net:27017,host01f.load.net:27017,host01g.load.net:27017 attempt: 0
Tue Oct 23 03:31:36 [conn124] warning: splitChunk failed - cmd: { splitChunk: "rating.entity_attr", keyPattern:
{ _id: 1.0 }
, min:
{ _id: "http://www.goodreads.ru/books/2414174/default.aspx?partner=qibet&domain-id=1969694" }
, max:
{ _id: "http://www.grando.hu/erositos-ontapados-belso-antenna-szelvedore-ksa-02-1524125370&domain-id=1973550" }
, from: "mongodb-sh1/host01d.load.net:27017,host01f.load.net:27017,host01g.load.net:27017", splitKeys: [
{ _id: "http://www.goodreads.ru/podarki/items/1626309/default.aspx&domain-id=1969694" }
], shardId: "rating.entity_attr-id"http://www.goodreads.ru/books/2414174/default.aspx?partner=qibet&domain-id=1969694"", configdb: "conf01g.load.net:27018,conf01d.load.net:27018,conf01f.load.net:27018" } result: { who: { _id: "rating.entity_attr", process: "host01d:27017:1350602021:1390778741", state: 2, ts: ObjectId('508546a0e23b10d5d497b8b2'), when: new Date(1350911648534), who: "host01d:27017:1350602021:1390778741:conn8736:375682389", why: "migrate-
{ _id: "http://catalogfirm.com.ua/1437/ZAPORIZKIY-PROFETSIYNIY-LITSEY-MODI-I-STILYU.html&domain-id=2005685" }
" }, errmsg: "the collection's metadata lock is taken", ok: 0.0 }
Tue Oct 23 03:31:46 [LockPinger] cluster conf01g.load.net:27018,conf01d.load.net:27018,conf01f.load.net:27018 pinged successfully at Tue Oct 23 03:31:46 2012 by distributed lock pinger 'conf01g.load.net:27018,conf01d.load.net:27018,conf01f.load.net:27018/host01g:27000:1350671783:1804289383', sleeping for 30000ms
Tue Oct 23 03:32:21 [conn434] ns: movie.entity_attr could not initialize cursor across all shards because : stale config detected for ns: movie.entity_attr ParallelCursor::_init @ mongodb-sh1/host01d.load.net:27017,host01f.load.net:27017,host01g.load.net:27017 attempt: 0
.
. stale config, nothing about getaddrinfo
.
Tue Oct 23 03:40:01 [conn875] SyncClusterConnection connecting to [conf01f.load.net:27018]
.
. stale config and successful ping, nothing about getaddrinfo
.
Tue Oct 23 03:48:04 [conn423] ns: rating.entity_attr could not initialize cursor across all shards because : stale config detected for ns: rating.entity_attr ParallelCursor::_init @ mongodb-sh1/host01d.load.net:27017,host01f.load.net:27017,host01g.load.net:27017 attempt: 0
Tue Oct 23 03:48:25 [conn1126] ns: movie.entity_attr could not initialize cursor across all shards because : stale config detected for ns: movie.entity_attr ParallelCursor::_init @ mongodb-sh1/host01d.load.net:27017,host01f.load.net:27017,host01g.load.net:27017 attempt: 0
Tue Oct 23 03:52:11 [LockPinger] cluster conf01g.load.net:27018,conf01d.load.net:27018,conf01f.load.net:27018 pinged successfully at Tue Oct 23 03:52:11 2012 by distributed lock pinger 'conf01g.load.net:27018,conf01d.load.net:27018,conf01f.load.net:27018/host01g:27000:1350671783:1804289383', sleeping for 30000ms
Tue Oct 23 03:54:13 [mongosMain] connection accepted from 111.222.111.48:42417 #1388
Tue Oct 23 03:54:13 [conn1388] authenticate:
{ authenticate: 1, user: "mobile-app", nonce: "bf2f4b223b0f8aa1", key: "b2b50a4bf78c6a12d8a888dbb5b14981" }
>>>Tue Oct 23 03:54:13 [conn1388] getaddrinfo("host01d.load.net") failed: Name or service not known
Tue Oct 23 03:54:13 [conn1388] DBException in process: socket exception
Tue Oct 23 03:54:13 [conn1388] end connection 111.222.111.48:42417
Tue Oct 23 03:54:13 [conn1237] end connection 111.222.111.48:50804
Tue Oct 23 03:54:13 [conn1238] end connection 111.222.111.48:50631
Tue Oct 23 03:54:13 [conn1304] end connection 111.222.111.48:52611
Tue Oct 23 03:54:13 [conn1252] end connection 111.222.111.48:51285
Tue Oct 23 03:54:13 [conn1235] end connection 111.222.111.48:45518
.
. stale config and successful ping, not a single word about getaddrinfo. And then it just started
.
Oct 23 07:32:06 [LockPinger] cluster conf01g.load.net:27018,conf01d.load.net:27018,conf01f.load.net:27018 pinged successfully at Tue Oct 23 07:32:06 2012 by distributed lock pinger 'conf01g.load.net:27018,conf01d.load.net:27018,conf01f.load.net:27018/host01g:27000:1350671783:1804289383', sleeping for 30000ms
Tue Oct 23 07:33:04 [conn828] ns: musicgroup.entity_attr could not initialize cursor across all shards because : stale config detected for ns: musicgroup.entity_attr ParallelCursor::_init @ mongodb-sh1/host01d.load.net:27017,host01f.load.net:27017,host01g.load.net:27017 attempt: 0
Tue Oct 23 07:33:10 [conn127] ns: musicgroup.entity_attr could not initialize cursor across all shards because : stale config detected for ns: musicgroup.entity_attr ParallelCursor::_init @ mongodb-sh1/host01d.load.net:27017,host01f.load.net:27017,host01g.load.net:27017 attempt: 0
Tue Oct 23 07:33:36 [Balancer] getaddrinfo("host01d.load.net") failed: Name or service not known
Tue Oct 23 07:33:36 [Balancer] scoped connection to conf01g.load.net:27018,conf01d.load.net:27018,conf01f.load.net:27018 not being returned to the pool
Tue Oct 23 07:33:36 [Balancer] caught exception while doing balance: socket exception
Tue Oct 23 07:34:06 [Balancer] SyncClusterConnection connecting to [conf01g.load.net:27018]
Tue Oct 23 07:34:06 [Balancer] SyncClusterConnection connecting to [conf01d.load.net:27018]
Tue Oct 23 07:34:06 [Balancer] SyncClusterConnection connecting to [conf01f.load.net:27018]
Tue Oct 23 07:34:06 [Balancer] getaddrinfo("conf01f.load.net") failed: Name or service not known
Tue Oct 23 07:34:06 [Balancer] SyncClusterConnection connect fail to: conf01f.load.net:27018 errmsg:
Tue Oct 23 07:34:06 [Balancer] trying reconnect to conf01f.load.net:27018
Tue Oct 23 07:34:06 [Balancer] getaddrinfo("conf01f.load.net") failed: Name or service not known
Tue Oct 23 07:34:06 [Balancer] reconnect conf01f.load.net:27018 failed
Tue Oct 23 07:34:06 [Balancer] scoped connection to conf01g.load.net:27018,conf01d.load.net:27018,conf01f.load.net:27018 not being returned to the pool
Tue Oct 23 07:34:06 [Balancer] caught exception while doing balance: socket exception
Tue Oct 23 07:34:36 [Balancer] SyncClusterConnection connecting to [conf01g.load.net:27018]
Tue Oct 23 07:34:36 [Balancer] SyncClusterConnection connecting to [conf01d.load.net:27018]
Tue Oct 23 07:34:36 [Balancer] SyncClusterConnection connecting to [conf01f.load.net:27018]
Tue Oct 23 07:34:36 [Balancer] SyncClusterConnection connecting to [conf01g.load.net:27018]
Tue Oct 23 07:34:36 [Balancer] SyncClusterConnection connecting to [conf01d.load.net:27018]
Tue Oct 23 07:34:36 [Balancer] SyncClusterConnection connecting to [conf01f.load.net:27018]
Tue Oct 23 07:34:36 [Balancer] getaddrinfo("conf01f.load.net") failed: Name or service not known
Tue Oct 23 07:34:36 [Balancer] SyncClusterConnection connect fail to: conf01f.load.net:27018 errmsg:
Tue Oct 23 07:34:36 [Balancer] trying reconnect to conf01f.load.net:27018
Tue Oct 23 07:34:36 [Balancer] getaddrinfo("conf01f.load.net") failed: Name or service not known
Tue Oct 23 07:34:36 [Balancer] reconnect conf01f.load.net:27018 failed
Tue Oct 23 07:34:36 [Balancer] scoped connection to conf01g.load.net:27018,conf01d.load.net:27018,conf01f.load.net:27018 not being returned to the pool
Tue Oct 23 07:34:36 [Balancer] caught exception while doing balance: socket exception
Tue Oct 23 07:35:06 [Balancer] SyncClusterConnection connecting to [conf01g.load.net:27018]
Tue Oct 23 07:35:06 [Balancer] SyncClusterConnection connecting to [conf01d.load.net:27018]
Tue Oct 23 07:35:06 [Balancer] SyncClusterConnection connecting to [conf01f.load.net:27018]
Tue Oct 23 07:35:06 [Balancer] SyncClusterConnection connecting to [conf01g.load.net:27018]
Tue Oct 23 07:35:06 [Balancer] SyncClusterConnection connecting to [conf01d.load.net:27018]
Tue Oct 23 07:35:06 [Balancer] SyncClusterConnection connecting to [conf01f.load.net:27018]
Tue Oct 23 07:35:06 [Balancer] getaddrinfo("conf01f.load.net") failed: Name or service not known
Tue Oct 23 07:35:06 [Balancer] SyncClusterConnection connect fail to: conf01f.load.net:27018 errmsg:
Tue Oct 23 07:35:06 [Balancer] trying reconnect to conf01f.load.net:27018
Tue Oct 23 07:35:06 [Balancer] getaddrinfo("conf01f.load.net") failed: Name or service not known
Tue Oct 23 07:35:06 [Balancer] reconnect conf01f.load.net:27018 failed
Tue Oct 23 07:35:06 [Balancer] scoped connection to conf01g.load.net:27018,conf01d.load.net:27018,conf01f.load.net:27018 not being returned to the pool
Tue Oct 23 07:35:06 [Balancer] caught exception while doing balance: socket exception
and from this point, getaddrinfo were every 30 seconds until mongos crash
|
|
seems, like I've found the problem.
Exceptions started with "Too many open files". It was pretty far away from the moment mongos crashes, but still, can it be a reason?
Skipped chunks of logs looks like"
Tue Oct 23 *::* [Balancer] trying reconnect to conf01*:27018
Tue Oct 23 *::* [Balancer] getaddrinfo("conf01*") failed: Name or service not known
"
Tue Oct 23 03:14:42 [conn1384] end connection 111.222.111.191:59065
Tue Oct 23 03:14:43 [mongosMain] connection accepted from 178.154.238.191:59069 #1385
Tue Oct 23 03:14:43 [mongosMain] Listener: accept() returns -1 errno:24 Too many open files
Tue Oct 23 03:14:43 [mongosMain] ERROR: Out of file descriptors. Waiting one second before trying to accept more connections.
Tue Oct 23 03:14:43 [conn1385] end connection 111.222.111.191:59069
Tue Oct 23 03:14:44 [mongosMain] connection accepted from 178.154.238.191:59070 #1386
Tue Oct 23 03:14:44 [mongosMain] Listener: accept() returns -1 errno:24 Too many open files
Tue Oct 23 03:14:44 [mongosMain] ERROR: Out of file descriptors. Waiting one second before trying to accept more connections.
Tue Oct 23 03:14:44 [conn1386] end connection 111.222.111.191:59070
Tue Oct 23 03:14:45 [mongosMain] connection accepted from 111.222.111.65:53230 #1387
Tue Oct 23 03:16:25 [LockPinger] cluster conf01g:27018,conf01d:27018,conf01f:27018 pinged successfully at Tue Oct 23 03:16:25 2012 by distributed lock pinger 'conf01g:27018,conf01d:27018,conf01f:27018/host01g:27000:1350671783:1804289383', sleeping for 30000ms
Tue Oct 23 03:18:20 [Balancer] getaddrinfo("host01d") failed: Name or service not known
Tue Oct 23 03:18:20 [Balancer] scoped connection to conf01g:27018,conf01d:27018,conf01f:27018 not being returned to the pool
Tue Oct 23 03:18:20 [Balancer] caught exception while doing balance: socket exception
.
.
.
Tue Oct 23 07:47:37 [Balancer] SyncClusterConnection connecting to [conf01f:27018]
Tue Oct 23 07:47:37 [Balancer] getaddrinfo("conf01f") failed: Name or service not known
Tue Oct 23 07:47:37 [Balancer] SyncClusterConnection connect fail to: conf01f:27018 errmsg:
Tue Oct 23 07:47:37 [Balancer] trying reconnect to conf01f:27018
Tue Oct 23 07:47:37 [Balancer] getaddrinfo("conf01f") failed: Name or service not known
Tue Oct 23 07:47:37 [Balancer] reconnect conf01f:27018 failed
Tue Oct 23 07:47:37 [Balancer] scoped connection to conf01g:27018,conf01d:27018,conf01f:27018 not being returned to the pool
Tue Oct 23 07:47:37 [Balancer] caught exception while doing balance: socket exception
Tue Oct 23 07:47:37 [LockPinger] cluster conf01g:27018,conf01d:27018,conf01f:27018 pinged successfully at Tue Oct 23 07:47:36 2012 by distributed lock pinger 'conf01g:27018,conf01d:27018,conf01f:27018/host01g:27000:1350671783:1804289383', sleeping for 30000ms
Tue Oct 23 07:48:07 [Balancer] SyncClusterConnection connecting to [conf01g:27018]
Tue Oct 23 07:48:07 [Balancer] SyncClusterConnection connecting to [conf01d:27018]
Tue Oct 23 07:48:07 [Balancer] SyncClusterConnection connecting to [conf01f:27018]
Tue Oct 23 07:48:07 [Balancer] SyncClusterConnection connecting to [conf01g:27018]
Tue Oct 23 07:48:07 [Balancer] SyncClusterConnection connecting to [conf01d:27018]
Tue Oct 23 07:48:07 [Balancer] SyncClusterConnection connecting to [conf01f:27018]
Tue Oct 23 07:48:07 [Balancer] getaddrinfo("conf01f") failed: Name or service not known
Tue Oct 23 07:48:07 [Balancer] SyncClusterConnection connect fail to: conf01f:27018 errmsg:
Tue Oct 23 07:48:07 [Balancer] trying reconnect to conf01f:27018
Tue Oct 23 07:48:07 [Balancer] getaddrinfo("conf01f") failed: Name or service not known
Tue Oct 23 07:48:07 [Balancer] reconnect conf01f:27018 failed
Tue Oct 23 07:48:07 [Balancer] scoped connection to conf01g:27018,conf01d:27018,conf01f:27018 not being returned to the pool
Tue Oct 23 07:48:07 [Balancer] caught exception while doing balance: socket exception
Tue Oct 23 07:48:37 [Balancer] SyncClusterConnection connecting to [conf01g:27018]
.
.
.
Tue Oct 23 07:49:07 [Balancer] getaddrinfo("conf01f.load.net") failed: Name or service not known
Tue Oct 23 07:49:07 [Balancer] SyncClusterConnection connect fail to: conf01f.load.net:27018 errmsg:
Tue Oct 23 07:49:07 [Balancer] trying reconnect to conf01f.load.net:27018
Tue Oct 23 07:49:07 [Balancer] getaddrinfo("conf01f.load.net") failed: Name or service not known
Tue Oct 23 07:49:07 [Balancer] reconnect conf01f.load.net:27018 failed
Tue Oct 23 07:49:07 [Balancer] scoped connection to conf01g.load.net:27018,conf01d.load.net:27018,conf01f.load.net:27018 not being returned to the pool
Tue Oct 23 07:49:07 [Balancer] caught exception while doing balance: socket exception
Tue Oct 23 07:49:37 [Balancer] SyncClusterConnection connecting to [conf01g.load.net:27018]
Tue Oct 23 07:49:37 [Balancer] SyncClusterConnection connecting to [conf01d.load.net:27018]
Tue Oct 23 07:49:37 [Balancer] SyncClusterConnection connecting to [conf01f.load.net:27018]
Tue Oct 23 07:49:37 [Balancer] SyncClusterConnection connecting to [conf01g.load.net:27018]
Tue Oct 23 07:49:37 [Balancer] SyncClusterConnection connecting to [conf01d.load.net:27018]
Tue Oct 23 07:49:37 [Balancer] SyncClusterConnection connecting to [conf01f.load.net:27018]
Tue Oct 23 07:49:37 [Balancer] getaddrinfo("conf01f.load.net") failed: Name or service not known
Tue Oct 23 07:49:37 [Balancer] SyncClusterConnection connect fail to: conf01f.load.net:27018 errmsg:
Tue Oct 23 07:49:37 [Balancer] trying reconnect to conf01f.load.net:27018
Tue Oct 23 07:49:37 [Balancer] getaddrinfo("conf01f.load.net") failed: Name or service not known
Tue Oct 23 07:49:37 [Balancer] reconnect conf01f.load.net:27018 failed
Tue Oct 23 07:49:37 [Balancer] scoped connection to conf01g.load.net:27018,conf01d.load.net:27018,conf01f.load.net:27018 not being returned to the pool
Tue Oct 23 07:49:37 [Balancer] caught exception while doing balance: socket exception
Tue Oct 23 07:50:07 [Balancer] SyncClusterConnection connecting to [conf01g.load.net:27018]
Tue Oct 23 07:50:07 [Balancer] SyncClusterConnection connecting to [conf01d.load.net:27018]
Tue Oct 23 07:50:07 [Balancer] SyncClusterConnection connecting to [conf01f.load.net:27018]
Tue Oct 23 07:50:07 [Balancer] SyncClusterConnection connecting to [conf01g.load.net:27018]
Tue Oct 23 07:50:07 [Balancer] SyncClusterConnection connecting to [conf01d.load.net:27018]
Tue Oct 23 07:50:07 [Balancer] SyncClusterConnection connecting to [conf01f.load.net:27018]
Tue Oct 23 07:50:07 [Balancer] getaddrinfo("conf01f.load.net") failed: Name or service not known
Tue Oct 23 07:50:07 [Balancer] SyncClusterConnection connect fail to: conf01f.load.net:27018 errmsg:
Tue Oct 23 07:50:07 [Balancer] trying reconnect to conf01f.load.net:27018
Tue Oct 23 07:50:07 [Balancer] getaddrinfo("conf01f.load.net") failed: Name or service not known
Tue Oct 23 07:50:07 [Balancer] reconnect conf01f.load.net:27018 failed
Tue Oct 23 07:50:07 [Balancer] scoped connection to conf01g.load.net:27018,conf01d.load.net:27018,conf01f.load.net:27018 not being returned to the pool
Tue Oct 23 07:50:07 [Balancer] caught exception while doing balance: socket exception
Tue Oct 23 07:50:37 [Balancer] SyncClusterConnection connecting to [conf01g.load.net:27018]
Tue Oct 23 07:50:37 [Balancer] SyncClusterConnection connecting to [conf01d.load.net:27018]
Tue Oct 23 07:50:37 [Balancer] SyncClusterConnection connecting to [conf01f.load.net:27018]
Tue Oct 23 07:50:37 [Balancer] SyncClusterConnection connecting to [conf01g.load.net:27018]
Tue Oct 23 07:50:37 [Balancer] SyncClusterConnection connecting to [conf01d.load.net:27018]
Tue Oct 23 07:50:37 [Balancer] SyncClusterConnection connecting to [conf01f.load.net:27018]
Tue Oct 23 07:50:37 [Balancer] getaddrinfo("conf01f.load.net") failed: Name or service not known
Tue Oct 23 07:50:37 [Balancer] SyncClusterConnection connect fail to: conf01f.load.net:27018 errmsg:
Tue Oct 23 07:50:37 [Balancer] trying reconnect to conf01f.load.net:27018
Tue Oct 23 07:50:37 [Balancer] getaddrinfo("conf01f.load.net") failed: Name or service not known
Tue Oct 23 07:50:37 [Balancer] reconnect conf01f.load.net:27018 failed
Tue Oct 23 07:50:37 [Balancer] scoped connection to conf01g.load.net:27018,conf01d.load.net:27018,conf01f.load.net:27018 not being returned to the pool
Tue Oct 23 07:50:37 [Balancer] caught exception while doing balance: socket exception
Tue Oct 23 07:51:07 [Balancer] SyncClusterConnection connecting to [conf01g.load.net:27018]
Tue Oct 23 07:51:07 [Balancer] SyncClusterConnection connecting to [conf01d.load.net:27018]
Tue Oct 23 07:51:07 [Balancer] SyncClusterConnection connecting to [conf01f.load.net:27018]
Tue Oct 23 07:51:07 [Balancer] SyncClusterConnection connecting to [conf01g.load.net:27018]
Tue Oct 23 07:51:07 [Balancer] SyncClusterConnection connecting to [conf01d.load.net:27018]
Tue Oct 23 07:51:07 [Balancer] SyncClusterConnection connecting to [conf01f.load.net:27018]
Tue Oct 23 07:51:07 [Balancer] getaddrinfo("conf01f.load.net") failed: Name or service not known
Tue Oct 23 07:51:07 [Balancer] SyncClusterConnection connect fail to: conf01f.load.net:27018 errmsg:
Tue Oct 23 07:51:07 [Balancer] trying reconnect to conf01f.load.net:27018
Tue Oct 23 07:51:07 [Balancer] getaddrinfo("conf01f.load.net") failed: Name or service not known
Tue Oct 23 07:51:07 [Balancer] reconnect conf01f.load.net:27018 failed
Tue Oct 23 07:51:07 [Balancer] scoped connection to conf01g.load.net:27018,conf01d.load.net:27018,conf01f.load.net:27018 not being returned to the pool
Tue Oct 23 07:51:07 [Balancer] caught exception while doing balance: socket exception
Tue Oct 23 07:51:37 [Balancer] SyncClusterConnection connecting to [conf01g.load.net:27018]
Tue Oct 23 07:51:37 [Balancer] SyncClusterConnection connecting to [conf01d.load.net:27018]
Tue Oct 23 07:51:37 [Balancer] SyncClusterConnection connecting to [conf01f.load.net:27018]
Tue Oct 23 07:51:37 [Balancer] SyncClusterConnection connecting to [conf01g.load.net:27018]
Tue Oct 23 07:51:37 [Balancer] SyncClusterConnection connecting to [conf01d.load.net:27018]
Tue Oct 23 07:51:37 [Balancer] SyncClusterConnection connecting to [conf01f.load.net:27018]
Tue Oct 23 07:52:30 [conn427] warning: splitChunk failed - cmd: { splitChunk: "musicgroup.entity", keyPattern:
{ _id: 1.0 }
, min:
{ _id: "http://www.myspace.com/rotten_mind?_escaped_fragment_=&domain-id=286078" }
, max:
{ _id: "http://www.myspace.com/santeriaandthepornhorns?_escaped_fragment_=&domain-id=286078" }
, from: "mongodb-sh1/host01d.load.net:27017,host01f.load.net:27017,host01g.load.net:27017", splitKeys: [
{ _id: "http://www.myspace.com/s_e_b_y&domain-id=286078" }
], shardId: "vertis-musicgroup.entity_attr-id"http://www.myspace.com/rotten_mind?escaped_fragment=&domain-id=286078"", configdb: "conf01g.load.net:27018,conf01d.load.net:27018,conf01f.load.net:27018" } result: { who: { _id: "vertis-musicgroup.entity_attr", process: "host01d:27017:1350602021:1390778741", state: 2, ts: ObjectId('50860ba9e23b10d5d497b96d'), when: new Date(1350962089258), who: "host01d:27017:1350602021:1390778741:conn8736:375682389", why: "migrate-
{ _id: "http://cf.myspace.com/ollerton&domain-id=286649" }
" }, errmsg: "the collection's metadata lock is taken", ok: 0.0 }
Tue Oct 23 07:52:30 [conn427] SyncClusterConnection connecting to [conf01g.load.net:27018]
Tue Oct 23 07:52:30 [conn427] SyncClusterConnection connecting to [conf01d.load.net:27018]
Tue Oct 23 07:52:30 [conn427] getaddrinfo("conf01d.load.net") failed: Name or service not known
Tue Oct 23 07:52:30 [conn427] SyncClusterConnection connect fail to: conf01d.load.net:27018 errmsg:
Tue Oct 23 07:52:30 [conn427] SyncClusterConnection connecting to [conf01f.load.net:27018]
Tue Oct 23 07:52:30 [conn427] getaddrinfo("conf01f.load.net") failed: Name or service not known
Tue Oct 23 07:52:30 [conn427] SyncClusterConnection connect fail to: conf01f.load.net:27018 errmsg:
Tue Oct 23 07:52:30 [conn427] trying reconnect to conf01d.load.net:27018
Tue Oct 23 07:52:30 [conn427] getaddrinfo("conf01d.load.net") failed: Name or service not known
Tue Oct 23 07:52:30 [conn427] reconnect conf01d.load.net:27018 failed
Tue Oct 23 07:52:30 [conn427] warning: could not get last error from a shard conf01g.load.net:27018,conf01d.load.net:27018,conf01f.load.net:27018 :: caused by :: socket exception
Received signal 6
Backtrace: 0x528c54 0x7fe56caa0af0 0x7fe56caa0a75 0x7fe56caa45c0 0x7fe56ca99941 0x730908 0x712fa5 0x700860 0x72ad97 0x73c8d1 0x5aad20 0x7fe56de219ca 0x7fe56cb5370d
mongos(_ZN5mongo17printStackAndExitEi+0x64)[0x528c54]
/lib/libc.so.6(+0x33af0)[0x7fe56caa0af0]
/lib/libc.so.6(gsignal+0x35)[0x7fe56caa0a75]
/lib/libc.so.6(abort+0x180)[0x7fe56caa45c0]
/lib/libc.so.6(__assert_fail+0xf1)[0x7fe56ca99941]
mongos(_ZN5mongo10ClientInfo12getLastErrorERKNS_7BSONObjERNS_14BSONObjBuilderEb+0x3898)[0x730908]
mongos(_ZN5mongo7Command20runAgainstRegisteredEPKcRNS_7BSONObjERNS_14BSONObjBuilderEi+0x805)[0x712fa5]
mongos(_ZN5mongo14SingleStrategy7queryOpERNS_7RequestE+0x4d0)[0x700860]
mongos(_ZN5mongo7Request7processEi+0x157)[0x72ad97]
mongos(_ZN5mongo21ShardedMessageHandler7processERNS_7MessageEPNS_21AbstractMessagingPortEPNS_9LastErrorE+0x71)[0x73c8d1]
mongos(_ZN5mongo3pms9threadRunEPNS_13MessagingPortE+0x260)[0x5aad20]
/lib/libpthread.so.0(+0x69ca)[0x7fe56de219ca]
/lib/libc.so.6(clone+0x6d)[0x7fe56cb5370d]
===
CursorCache at shutdown - sharded: 1 passthrough: 2936
Received signal 11
Backtrace: 0x528c54 0x7fe56caa0af0 0x2784c40
mongos(_ZN5mongo17printStackAndExitEi+0x64)[0x528c54]
/lib/libc.so.6(+0x33af0)[0x7fe56caa0af0]
[0x2784c40]
===
|