Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Duplicate
Priority: Major - P3
Fix Version/s: None
Affects Version/s: 3.2.11, 3.2.18
Component/s: Querying
Labels:
- query-44-grooming

Assigned Teams:

Query
Operating System:
ALL
Steps To Reproduce:
Hide

I use MongoDB v3.2.11/3.2.18 on a VM with 1 GB of RAM with no swap configured. The goal is to punch the server hard enough for performance to degrade significantly but before it OOMs or gets crippled hard enough so sockets start to time out. This is how I do it:

I start my Java code (I thought it would be easier to reproduce with a larger number of threads but that didn't appear to be the case)

I do a mongorestore of a reasonable small collection (the dump size is ~50 MB). I let the first execution to complete, then run it again interrupting half-way

Once the server is in the required state the exception can be encountered easily.
Show
I use MongoDB v3.2.11/3.2.18 on a VM with 1 GB of RAM with no swap configured. The goal is to punch the server hard enough for performance to degrade significantly but before it OOMs or gets crippled hard enough so sockets start to time out. This is how I do it: I start my Java code (I thought it would be easier to reproduce with a larger number of threads but that didn't appear to be the case) I do a mongorestore of a reasonable small collection (the dump size is ~50 MB). I let the first execution to complete, then run it again interrupting half-way Once the server is in the required state the exception can be encountered easily.
Case:
Confidence Status:
None
Work Order:
3

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name:
None
Goal Link:
None

In my repro this happens when the server is struggling. The query looks like:

2017-12-15T17:38:37.527+1100 D COMMAND  [conn8516] run command local.$cmd { find: "oplog.rs", filter: { op: { $in: [ "i", "u" ] }, ts: { $gte: Timestamp 1491415276000|0 } }, sort: { $natural: 1 }, batchSize: 4000, tailable: true, awaitData: true, oplogReplay: true, noCursorTimeout: true }

Sometimes (unfortunately my repro isn't consistent and the outcome is non-deterministic) a getMore fails like:

2017-12-19T16:55:39.400+1100 I COMMAND  [conn103] command local.oplog.rs command: getMore { getMore: 21203342251, collection: "oplog.rs", batchSize: 4000 } cursorid:21203342251 keyUpdates:0 writeConflicts:0 exception: operation exceeded time limit code:50 numYields:0 reslen:89 locks:{ Global: { acquireCount: { r: 2 } }, Database: { acquireCount: { r: 1 } }, oplog: { acquireCount: { r: 1 } } } protocol:op_query 22401ms

The chance of getting an exception doesn't seem to be depending on the duration of the getMore - sometimes I see them taking much longer, but no exception is thrown:

2017-12-14T16:11:04.323+1100 I COMMAND  [conn29] command local.oplog.rs command: getMore { getMore: 18711090531, collection: "oplog.rs", batchSize: 4000 } planSummary: COLLSCAN cursorid:18711090531 keysExamined:0 docsExamined:213 keyUpdates:0 writeConflicts:0 numYields:2 nreturned:213 reslen:28533 locks:{ Global: { acquireCount: { r: 6 } }, Database: { acquireCount: { r: 3 } }, oplog: { acquireCount: { r: 3 } } } protocol:op_query 109834ms

duplicates

SERVER-33942 getMore with maxTimeMS returns "operation exceeded time limit" if concurrent blocking operation is running

Closed

Assignee:: Backlog - Query Team (Inactive)
Reporter:: Dmitry Ryabtsev
Participants:: Backlog - Query Team, David Storch, Dmitry Ryabtsev
Votes:: 0 Vote for this issue
Watchers:: 10 Start watching this issue

Created:: Dec 19 2017 06:19:53 AM UTC
Updated:: Dec 06 2022 03:43:27 AM UTC
Resolved:: Jun 28 2019 07:32:13 PM UTC

Details

Description

Attachments

Issue Links

Activity

People

Dates