Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Incomplete
Priority: Major - P3
Fix Version/s: None
Affects Version/s: 4.0.2
Component/s: Aggregation Framework
Labels:
None

Operating System:
ALL
Steps To Reproduce:
Hide

Create Replica Set with 1 Primary, 1 Secondary and 1 Arbiter

Create collection with >= 52 million documents

Reboot Secondary to clear memory caches

Connect to Secondary and run expensive aggregation on the collection

The aggregation may fail with this error

On re-run the aggregation has a good chance to complete correctly, since memory cache is filled partly
Show
Create Replica Set with 1 Primary, 1 Secondary and 1 Arbiter Create collection with >= 52 million documents Reboot Secondary to clear memory caches Connect to Secondary and run expensive aggregation on the collection The aggregation may fail with this error On re-run the aggregation has a good chance to complete correctly, since memory cache is filled partly
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

I tried searching for issues and entries in the release notes but couldn't find anything that seems to address our issue, apologies if this is a duplicate.

MongoDB version: 4.0.2

Setup: Replica Set: 1 Primary, 1 Secondary, 1 Arbiter; no Sharding

OS: CentOS 7.7.1908

CPUs: 6 (Primary & Secondary)

RAM: 31GB (Primary & Secondary)

Data size: ~100GB

Aggregations on a collection with ~52 million documents sometimes fail with errors like this (DB, collection names and pipeline redacted):

2019-11-27T11:35:22.354+0000 I COMMAND [conn25] command db.collection appName: "MongoDB Shell" command: aggregate { aggregate: "collection", pipeline: <PIPELINE>, cursor: {}, lsid: { id: UUID("18d0572d-63e3-46f5-b73e-0b2d21943d6e") }, 
$clusterTime: { clusterTime: Timestamp(1574854281, 3), 
signature: { hash: BinData(0, 055F8108EC0EBAF2F63E875B114500BABFDC7FBE), keyId: 6708703671850369025 }}, 
$readPreference: { mode: "secondaryPreferred" }, $db: "db" } 
planSummary: COLLSCAN numYields:71159 ok:0 errMsg:"operation was interrupted" errName:CursorKilled errCode:237 reslen:238 locks:{ Global: { acquireCount: { r: 71941 }}, 
Database: { acquireCount: { r: 71940 }}, 
Collection: { acquireCount:  { r: 71940 }}} protocol:op_msg 230092ms}}

In this example log it happened on a COLLSCAN, but it also happens with IXSCAN just the same. It seems to happen much more often when the affected data is not well cached in memory, such as after a server reboot. We tried to reproduce this on the Primary, but so far we can only reproduce it on the Secondary.

One generic aggregation which produces this for us is this (had to redact all the field names):

db.collection.aggregate(
[
      {"$match" : {
        "field0" : "value0",
         "field1" : {
          "$gte" : ISODate("2018-01-01T00:00:00.000Z"),
           "$lt" : ISODate("2019-11-26T00:00:00.000Z")
       }}},

       { "$project" : {
         "_id" : 0,
        "field1" : 1,
         "field2.field3" : 1 
       }},
       {"$group" : {
         "_id" : {
          "key" : {
             "field1name" : "$field1"
           }
        },
         "field2name" : {
          "$sum" : "$field2.field3"
        },
        "count" : {
          "$sum" : 1
        }
      }}
  ])

The only other reference to this that we could find is this post:

https://dba.stackexchange.com/questions/238391/mongodb-return-error-cursor-stage-caused-by-operation-was-interrupted?rq=1

Please let me know if I can provide any other info.

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

Reaper.png
203 kB
Jan 13 2020 08:41:22 AM UTC

Assignee:: Carl Champain (Inactive)
Reporter:: Frank Shimizu
Participants:: Carl Champain, Frank Shimizu
Votes:: 0 Vote for this issue
Watchers:: 5 Start watching this issue

Created:: Nov 27 2019 12:57:06 PM UTC
Updated:: Feb 19 2020 02:53:37 PM UTC
Resolved:: Jan 14 2020 03:31:23 PM UTC

Details

Description

Attachments

Attachments

Activity

People

Dates