Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-44867

Aggregate on Secondary fails with CursorKilled

    XMLWordPrintableJSON

Details

    • Icon: Bug Bug
    • Resolution: Incomplete
    • Icon: Major - P3 Major - P3
    • None
    • 4.0.2
    • Aggregation Framework
    • None
    • ALL
    • Hide
      • Create Replica Set with 1 Primary, 1 Secondary and 1 Arbiter
      • Create collection with >= 52 million documents
      • Reboot Secondary to clear memory caches
      • Connect to Secondary and run expensive aggregation on the collection
      • The aggregation may fail with this error
      • On re-run the aggregation has a good chance to complete correctly, since memory cache is filled partly

       

      Show
      Create Replica Set with 1 Primary, 1 Secondary and 1 Arbiter Create collection with >= 52 million documents Reboot Secondary to clear memory caches Connect to Secondary and run expensive aggregation on the collection The aggregation may fail with this error On re-run the aggregation has a good chance to complete correctly, since memory cache is filled partly  

    Description

      I tried searching for issues and entries in the release notes but couldn't find anything that seems to address our issue, apologies if this is a duplicate.

       

      MongoDB version: 4.0.2

      Setup: Replica Set: 1 Primary, 1 Secondary, 1 Arbiter; no Sharding

      OS: CentOS 7.7.1908

      CPUs: 6 (Primary & Secondary)

      RAM: 31GB (Primary & Secondary)

      Data size: ~100GB

       

      Aggregations on a collection with ~52 million documents sometimes fail with errors like this (DB, collection names and pipeline redacted):

      2019-11-27T11:35:22.354+0000 I COMMAND [conn25] command db.collection appName: "MongoDB Shell" command: aggregate { aggregate: "collection", pipeline: <PIPELINE>, cursor: {}, lsid: { id: UUID("18d0572d-63e3-46f5-b73e-0b2d21943d6e") }, 
      $clusterTime: { clusterTime: Timestamp(1574854281, 3), 
      signature: { hash: BinData(0, 055F8108EC0EBAF2F63E875B114500BABFDC7FBE), keyId: 6708703671850369025 }}, 
      $readPreference: { mode: "secondaryPreferred" }, $db: "db" } 
      planSummary: COLLSCAN numYields:71159 ok:0 errMsg:"operation was interrupted" errName:CursorKilled errCode:237 reslen:238 locks:{ Global: { acquireCount: { r: 71941 }}, 
      Database: { acquireCount: { r: 71940 }}, 
      Collection: { acquireCount:  { r: 71940 }}} protocol:op_msg 230092ms}}
      

       

      In this example log it happened on a COLLSCAN, but it also happens with IXSCAN just the same. It seems to happen much more often when the affected data is not well cached in memory, such as after a server reboot. We tried to reproduce this on the Primary, but so far we can only reproduce it on the Secondary.

       

      One generic aggregation which produces this for us is this (had to redact all the field names):

      db.collection.aggregate(
      [
            {"$match" : {
              "field0" : "value0",
               "field1" : {
                "$gte" : ISODate("2018-01-01T00:00:00.000Z"),
                 "$lt" : ISODate("2019-11-26T00:00:00.000Z")
             }}},
       
             { "$project" : {
               "_id" : 0,
              "field1" : 1,
               "field2.field3" : 1 
             }},
             {"$group" : {
               "_id" : {
                "key" : {
                   "field1name" : "$field1"
                 }
              },
               "field2name" : {
                "$sum" : "$field2.field3"
              },
              "count" : {
                "$sum" : 1
              }
            }}
        ])
      

       

      The only other reference to this that we could find is this post:

      https://dba.stackexchange.com/questions/238391/mongodb-return-error-cursor-stage-caused-by-operation-was-interrupted?rq=1

       

      Please let me know if I can provide any other info.

      Attachments

        Activity

          People

            carl.champain@mongodb.com Carl Champain (Inactive)
            frank.shimizu@est.fujitsu.com Frank Shimizu
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: