Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-5845

Cursor can get deleted because of timeout at the finish stage of a sharded map reduce

    • Type: Icon: Improvement Improvement
    • Resolution: Done
    • Priority: Icon: Critical - P2 Critical - P2
    • 2.3.0
    • Affects Version/s: 2.1.1
    • Component/s: MapReduce
    • Labels:
      None

      When I run one of MR job, the second time fail with error:

      {"errmsg":"exception: getMore: cursor didn't exist on server, possible restart or timeout?","code":13127,"ok":0}
      

      And by logs, I see that in final stage (when it iterate over "CollectionByWhatMapReduceRun_IntNumber"), it not handle connections

      // conn178752 - MR job connection
      ...
      Mon May 14 17:48:35 [conn178752] getmore Database.CollectionByWhatMapReduceRun_IntNumber cursorid:4899920544814319828 ntoreturn:0 keyUpdates:0 nreturned:4316 reslen:5621110 142ms
      ...
      Mon May 14 17:48:51 [initandlisten] connection accepted from 127.0.0.1:33015 #559035 (446 connections now open)
      Mon May 14 17:48:51 [initandlisten] connection accepted from 127.0.0.1:33016 #559036 (447 connections now open)
      ...
      Mon May 14 18:23:14 [conn178752]                52200/108636    48%
      ...
      Mon May 14 18:23:14 [conn180633] SocketException handling request, closing client connection: 9001 socket exception [2] server [127.0.0.1:60746]
      ...
      Mon May 14 18:24:11 [initandlisten] connection refused because too many open connections: 819
      Mon May 14 18:24:11 [initandlisten] connection accepted from 127.0.0.1:59086 #754273 (820 connections now open)
      ...
      Mon May 14 18:24:16 [conn178752] getMore: cursorid not found Database.CollectionByWhatMapReduceRun_IntNumber 4899920544814319828
      

      And after - number of connection decrements

      But another type of MR job is runs okay.

      Any ideas?

            Assignee:
            randolph@mongodb.com Randolph Tan
            Reporter:
            azat Azat Khuzhin
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: