[SERVER-26608] $out can attempt to upgrade locks if an error is encountered during a getMore Created: 12/Oct/16  Updated: 06/Dec/22  Resolved: 25/Aug/17

Status: Closed
Project: Core Server
Component/s: Aggregation Framework
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Charlie Swanson Assignee: Backlog - Query Team (Inactive)
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
depends on SERVER-22541 Aggregation plan executors should be ... Closed
Assigned Teams:
Query
Operating System: ALL
Participants:

 Description   

For starters, this can only happen if your batchSize is 0, since $out returns no results and the error needs to take place in a getMore. The following series of events can occur, eventually triggering a dassert, or potentially worse (I don't fully understand the implications of having the flush lock locked in the wrong mode).

  1. An aggregate starts running inside a getMore. The cursor associated with the aggregation is pinned.
  2. An error occurs, throwing a UserException (in my instance an inserted document failed document validation).
  3. This ScopeGuard is destructed, triggering it to clean up the cursor.
  4. GetMoreCmd::cleanupCursor() is called, taking the global, db, and collection lock in IS mode. I believe this is required to access the CursorManager.
  5. ClientCursorPin::deleteUnderlying() is called, triggering kills and destructions eventually leading to the $out stage's destruction.
  6. The $out stage's destructor is triggered, and attempts to clean up the temporary collection it created and partially filled.
  7. The drop command attempts to take the DB lock in X mode, which is an upgrade from the held IS lock, which eventually triggers this dassert().

I think this can also happen on the 3.2 branch and earlier, but I haven't verified.



 Comments   
Comment by David Storch [ 25/Aug/17 ]

I attempted to reproduce this problem both by applying the patch from Charlie's comment above, and by running the following against a debug build:

> db.createCollection("c", {validator: {illegalField: {$exists: false}}})
{ "ok" : 1 }
> db.d.insert({illegalField: true})
WriteResult({ "nInserted" : 1 })
> db.d.aggregate([{$out: "c"}], {cursor: {batchSize: 0}})
Error: getMore command failed: {
	"ok" : 0,
	"errmsg" : "insert for $out failed: { connectionId: 1, err: \"Document failed validation\", code: 121, codeName: \"DocumentValidationFailure\", n: 0, ok: 1.0 }",
	"code" : 16996,
	"codeName" : "Location16996"
}

I could not reproduce by either method. Therefore, I believe this problem has been fixed, almost certainly as a consequence of the changes for SERVER-22541. Closing as Gone Away.

Comment by Charlie Swanson [ 14/Oct/16 ]

This was discovered by this test case that was later removed in jstests/views/views_aggregation.js:

diff --git a/jstests/views/views_aggregation.js b/jstests/views/views_aggregation.js
index 2544f2a..5a9c010 100644
--- a/jstests/views/views_aggregation.js
+++ b/jstests/views/views_aggregation.js
@@ -172,16 +172,6 @@
     assert.writeOK(viewsDB.invalidDocs.insert({illegalField: "present"}));
     assert.commandWorked(viewsDB.createView("invalidDocsView", "invalidDocs", []));
 
+    assertErrorCode(
+        viewsDB.invalidDocsView,
+        [{$out: validatedCollName}],
+        16996,
+        "Expected $out insertions to fail validation because 'bypassDocumentValidation' was not" +
+        " specified");
+
     assert.commandWorked(
         viewsDB.runCommand({
             aggregate: "invalidDocsView",

When this issue is fixed, we should consider re-adding this test case.

Comment by Charlie Swanson [ 13/Oct/16 ]

After some discussion, we've decided that this bug is very difficult to hit, likely not high impact, and will be substantially easier to fix after resolving SERVER-22541.

I'm marking this as 3.5 Required, and as depending on SERVER-22541.

Generated at Thu Feb 08 04:12:38 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.