[SERVER-36966] Sharded map reduce may fail to clean up temporary output collection Created: 31/Aug/18  Updated: 29/Oct/23  Resolved: 05/Nov/18

Status: Closed
Project: Core Server
Component/s: MapReduce, Sharding
Affects Version/s: None
Fix Version/s: 4.0.5, 4.1.5

Type: Bug Priority: Major - P3
Reporter: Charlie Swanson Assignee: Charlie Swanson
Resolution: Fixed Votes: 0
Labels: todo_in_code
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Related
related to SERVER-43477 Complete TODO listed in SERVER-36966 Closed
related to SERVER-44211 Complete TODO listed in SERVER-36966 Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v4.0, v3.6
Participants:
Linked BF Score: 14

 Description   

This call is supposed to drop the temporary output collections if anything goes wrong in the command, but in some cases we don't reach there. In particular, I've reproduced a failure where this line can throw an exception due to a stale config. When that happens, the cleanup code doesn't run and the collection is left around.

To fix this, we should use a ScopeGuard to make sure we clean up the temporary collections if we exit the function due to an exception.



 Comments   
Comment by Githook User [ 06/Nov/19 ]

Author:

{'username': 'cswanson310', 'email': 'charlie.swanson@mongodb.com', 'name': 'Charlie Swanson'}

Message: SERVER-44211 Unblacklist tests now that SERVER-36966 is done
Branch: master
https://github.com/mongodb/mongo/commit/5cb4b0a1cc7a60d6cea01e551aa0f5e5f6e823b4

Comment by Githook User [ 19/Nov/18 ]

Author:

{'name': 'Charlie Swanson', 'email': 'charlie.swanson@mongodb.com', 'username': 'cswanson310'}

Message: SERVER-36966 Validate ns in cluster mapReduce

Also ensures that the temp collections are always up to date, and
extends the passthrough coverage of the mapReduce command outputting to
a sharded collection.

(cherry picked from commit 7dbcd710077bc4141e71730be9e12558880375e6)
Branch: v4.0
https://github.com/mongodb/mongo/commit/794e75a24d975f80317cf92c004cf5cb9ea5b03d

Comment by Githook User [ 05/Nov/18 ]

Author:

{'name': 'Charlie Swanson', 'email': 'charlie.swanson@mongodb.com', 'username': 'cswanson310'}

Message: SERVER-36966 Validate ns in cluster mapReduce

Also ensures that the temp collections are always up to date, and
extends the passthrough coverage of the mapReduce command outputting to
a sharded collection.
Branch: master
https://github.com/mongodb/mongo/commit/7dbcd710077bc4141e71730be9e12558880375e6

Comment by Charlie Swanson [ 05/Oct/18 ]

blake.oler sure, I'd be happy to!

greg.mckeon can you clarify why this was bumped into a sprint? I was treating this as a BF friday ticket since it is blocking a BF. The workflow for those tickets is no longer to drag them into the sprint. 

 

Comment by Blake Oler [ 02/Oct/18 ]

I will be adding temporary blacklists for this test as part of SERVER-37330 (running into this failure frequently). As part of the fix for this ticket, could you remove those blacklists? charlie.swanson

Comment by Charlie Swanson [ 24/Sep/18 ]

david.storch yes, but I need to put up a further patch with the steps outlined above.

Comment by David Storch [ 24/Sep/18 ]

charlie.swanson, is the code review for this ticket still active?

Comment by Charlie Swanson [ 21/Sep/18 ]

Investigated more and talked to Esha - we think there are two bugs in map reduce:

  1. Should it encounter a stale shard version, it leaves around a temp collection - this is what we thought the original bug was.
  2. It doesn't correctly enforce that the collection being output to doesn't already exist. We think that if someone issues a map reduce with the syntax {{ {out: "collection"}

    }}, that map reduce should fail if "collection" already exists and is sharded. Instead, the user should specify {out: {replace: "collection", sharded: true}}.

We think we know how to fix this, but I didn't get enough time to implement the fix. We should
1) enforce the collection doesn't already exist in this scenario
2) add an ON_BLOCK_EXIT to clean up the temp collections should anything throw an exception
3) update tests in jstests/core to account for this - some will start failing due to #1. One idea which we want to explore is changing the shell's mapReduce helper to automatically adjust the syntax used in the "implicitly sharded collections" passthrough.

Comment by Ian Whalen (Inactive) [ 20/Sep/18 ]

Next step is still for Charlie to talk to Esha.

Generated at Thu Feb 08 04:44:35 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.