Details

    • Type: Improvement
    • Status: Open
    • Priority: Major - P3
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: Backlog
    • Component/s: Sharding
    • Labels:
      None

      Description

      A fuller solution to SERVER-4800 requires some way of informing mongod that the cursor is still active. Might also be helpful as an app-level check if a host is down or not...

        Issue Links

          Activity

          Hide
          robinyarbrough@carfax.com Robin Yarbrough added a comment -

          We are also experiencing this issue when running a long running aggregation query. These particular aggregation queries finished successfully until we upgraded to 3.0.4 from 2.6.7. We keep getting the following error after running for several days. "exception: getMore: cursor didn't exist on server, possible restart or timeout?" Are there any updates on this issue?

          Show
          robinyarbrough@carfax.com Robin Yarbrough added a comment - We are also experiencing this issue when running a long running aggregation query. These particular aggregation queries finished successfully until we upgraded to 3.0.4 from 2.6.7. We keep getting the following error after running for several days. "exception: getMore: cursor didn't exist on server, possible restart or timeout?" Are there any updates on this issue?
          Hide
          ramon.fernandez Ramon Fernandez added a comment - - edited

          Robin Yarbrough, unfortunately there are no updates on this issue. One can use the noTimeout option, or as Dan mentioned above, a workaround may be possible with SERVER-8188. We'll update this ticket when it gets considered for planning.

          EDIT
          I'm correcting my previous message because one can't use noTimeout for aggregation operations – apologies for the confusion.

          Show
          ramon.fernandez Ramon Fernandez added a comment - - edited Robin Yarbrough , unfortunately there are no updates on this issue. One can use the noTimeout option , or as Dan mentioned above, a workaround may be possible with SERVER-8188 . We'll update this ticket when it gets considered for planning. EDIT I'm correcting my previous message because one can't use noTimeout for aggregation operations – apologies for the confusion.
          Hide
          royrez@microsoft.com Roy Reznik added a comment -

          The noTimeout option is not applicable here, since he presented an aggregation - that's an operation that does not support the noTimeout option.
          So currently the only workaround is to set the timeout for the entire mongod? Is that a joke or are you serious?

          Show
          royrez@microsoft.com Roy Reznik added a comment - The noTimeout option is not applicable here, since he presented an aggregation - that's an operation that does not support the noTimeout option. So currently the only workaround is to set the timeout for the entire mongod? Is that a joke or are you serious?
          Hide
          joerg.rech Jörg Rech added a comment -

          Aggregation of medium datasets (40 million) in clusters is still a problem with 3.4.2. When aggregating we get the following error after ~2 hours:

          An exception occurred while aggregating data com.mongodb.CommandFailureException:

          { "serverUsed" : "xxx.xxx.xxx.xxx:27017" , "ok" : 0.0 , "errmsg" : "cursor id 51287731144 didn't exist on server." , "code" : 13127 , "codeName" : "Location13127"}

          The same aggregation works for datasets with 20 million documents and takes ~60 minutes.

          Is there any solution or workaround that could help? Is it just aggregation or could a map/reduce help? Is a solution planned for an upcoming release?

          Btw. we also have datasets with 400 million and 3 billion documents that should be processed by this aggregation (even if it would take days or weeks) - so if someone is working on a solution please keep users like us in mind.

          Show
          joerg.rech Jörg Rech added a comment - Aggregation of medium datasets (40 million) in clusters is still a problem with 3.4.2. When aggregating we get the following error after ~2 hours: An exception occurred while aggregating data com.mongodb.CommandFailureException: { "serverUsed" : "xxx.xxx.xxx.xxx:27017" , "ok" : 0.0 , "errmsg" : "cursor id 51287731144 didn't exist on server." , "code" : 13127 , "codeName" : "Location13127"} The same aggregation works for datasets with 20 million documents and takes ~60 minutes. Is there any solution or workaround that could help? Is it just aggregation or could a map/reduce help? Is a solution planned for an upcoming release? Btw. we also have datasets with 400 million and 3 billion documents that should be processed by this aggregation (even if it would take days or weeks) - so if someone is working on a solution please keep users like us in mind.
          Hide
          oleg@evergage.com Oleg Rekutin added a comment -

          Jörg, the only practical solution seems to increase the cursorTimeoutMillis parameter (see SERVER-8188 for more info). The default value is 10 minutes. By running values of 60 or 80 minutes, you might find that all your aggregations complete.

          The problem is that the cursor times out individually on a single server, if during the aggregation no documents are fetched for 10 minutes from that shard. This kills the entire fetch of a query or aggregation results.

          Show
          oleg@evergage.com Oleg Rekutin added a comment - Jörg, the only practical solution seems to increase the cursorTimeoutMillis parameter (see SERVER-8188 for more info). The default value is 10 minutes. By running values of 60 or 80 minutes, you might find that all your aggregations complete. The problem is that the cursor times out individually on a single server, if during the aggregation no documents are fetched for 10 minutes from that shard. This kills the entire fetch of a query or aggregation results.

            People

            • Votes:
              29 Vote for this issue
              Watchers:
              52 Start watching this issue

              Dates

              • Created:
                Updated: