[SERVER-44110] Change stream aggregation fails with "operation exceeded time limit" if maxTimeMS is specified Created: 18/Oct/19  Updated: 27/Oct/23  Resolved: 07/Dec/19

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Kevin Rosendahl Assignee: Bernard Gorman
Resolution: Works as Designed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Operating System: ALL
Steps To Reproduce:

Specify a maxTimeMS with the aggregate command with a $changeStream pipeline shorter than the amount of time it will take to scan the oplog.

I produced this by opening 300 concurrent change streams with startAfter pointing to an entry earlier in the oplog and a maxTimeMS of 1000 on a 4 core machine.

Sprint: Query 2019-11-18, Query 2019-12-02, Query 2019-12-16
Participants:

 Description   

For $changeStream, if the initial aggregate command specifies an explicit maxTimeMS and the work of constructing the first batch takes longer than maxTimeMS, the command will fail with ErrorCodes::MaxTimeMSExpired despite the fact that this is a tailable awaitData cursor.

Full log line of one such command failing:

2019-10-18T05:29:42.763+0000 I  COMMAND  [conn1316] command local.oplog.rs command: aggregate { aggregate: "fts", cursor: {}, maxTimeMS: 1000, pipeline: [ { $changeStream: { startAfter: { _data: "825DA9470E000000012B0229296E04" }, fullDocument: "updateLookup" } } ], $db: "test", $clusterTime: { clusterTime: Timestamp(1571376579, 1), signature: { hash: BinData(0, 00F7C4A3426B593FA553F668DD7DBA4E1C0B6F33), keyId: 6745581691277934593 } }, lsid: { id: UUID("3655f6dd-aabb-49e7-b582-b1459d4c879b") }, $readPreference: { mode: "primaryPreferred" } } planSummary: COLLSCAN numYields:58 ok:0 errMsg:"Error in $cursor stage :: caused by :: operation exceeded time limit" errName:MaxTimeMSExpired errCode:50 reslen:285 locks:{ ReplicationStateTransition: { acquireCount: { w: 61 } }, Global: { acquireCount: { r: 61 } }, Database: { acquireCount: { r: 60 } }, Collection: { acquireCount: { r: 1 } }, Mutex: { acquireCount: { r: 3 } }, oplog: { acquireCount: { r: 59 } } } protocol:op_msg 1103ms



 Comments   
Comment by Bernard Gorman [ 07/Dec/19 ]

Spoke with charlie.swanson about this today. We agreed that this is intended behaviour, since the initial aggregate is doing the work necessary in order to establish the tailable awaitData cursor, and that if the user wishes to ensure a fast return then they should explicitly specify batchSize:0 when running the command. They can then move through the oplog incrementally using getMore with maxTimeMs.

Generated at Thu Feb 08 05:05:02 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.