When I run one of MR job, the second time fail with error:
{"errmsg":"exception: getMore: cursor didn't exist on server, possible restart or timeout?","code":13127,"ok":0}
And by logs, I see that in final stage (when it iterate over "CollectionByWhatMapReduceRun_IntNumber"), it not handle connections
// conn178752 - MR job connection
...
Mon May 14 17:48:35 [conn178752] getmore Database.CollectionByWhatMapReduceRun_IntNumber cursorid:4899920544814319828 ntoreturn:0 keyUpdates:0 nreturned:4316 reslen:5621110 142ms
...
Mon May 14 17:48:51 [initandlisten] connection accepted from 127.0.0.1:33015 #559035 (446 connections now open)
Mon May 14 17:48:51 [initandlisten] connection accepted from 127.0.0.1:33016 #559036 (447 connections now open)
...
Mon May 14 18:23:14 [conn178752] 52200/108636 48%
...
Mon May 14 18:23:14 [conn180633] SocketException handling request, closing client connection: 9001 socket exception [2] server [127.0.0.1:60746]
...
Mon May 14 18:24:11 [initandlisten] connection refused because too many open connections: 819
Mon May 14 18:24:11 [initandlisten] connection accepted from 127.0.0.1:59086 #754273 (820 connections now open)
...
Mon May 14 18:24:16 [conn178752] getMore: cursorid not found Database.CollectionByWhatMapReduceRun_IntNumber 4899920544814319828
And after - number of connection decrements
But another type of MR job is runs okay.
Any ideas?
- related to
-
SERVER-6906 Potential cursor timeout at reduce stage of map reduce
- Closed