[SERVER-36918] Searching on a collection that is rebuilt using $out will sometimes raise an exception Created: 29/Aug/18 Updated: 27/Oct/23 Resolved: 06/Sep/18 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Aggregation Framework |
| Affects Version/s: | 4.0.0 |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Mark [X] | Assignee: | Kyle Suarez |
| Resolution: | Works as Designed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||
| Participants: | |||||||||||||
| Description |
|
Similar to my (closed) bug reported here: Running a regular "find" query will raise an exception if the collection is being rebuilt using $out. I'm hoping I'm not exhausting you with my bug reports: It's affecting our systems and I truly think this is wrong behavior. Having retrying code isn't particularly fun. |
| Comments |
| Comment by Kyle Suarez [ 06/Sep/18 ] | |||||||
|
Hello Segal, After looking more into your issue in I have spoken to the Query Team about the behavior of cursors when their underlying collection is dropped. However, this is not something we are planning to change: when a collection is dropped, cursors operating on that collection will be killed. I have two suggestions for you, then, regarding your original issue in this ticket:
I'm sorry if this isn't exactly the suggestion you were looking for but changing cursor behavior when collections are dropped is not something I can see us changing. | |||||||
| Comment by Mark [X] [ 04/Sep/18 ] | |||||||
|
About my read only view, certainly: I have opened a bug, https://jira.mongodb.org/browse/SERVER-36983 it's alarmingly simple so I hope I'm not missing something. The second solution you have provided really looks like the case I have described. However it seems like views that have a completely empty pipeline somewhat behave better than those that do have pipelines, although not entirely... (see
I will run more benchmarks to see if we could use your second proposal, it's quite an original idea and I fancy those. Thanks! | |||||||
| Comment by Kyle Suarez [ 04/Sep/18 ] | |||||||
|
Hi Segal, I'm sorry for the delay in responding; I went away for a long Labor Day weekend. It sounds like you're using the $out to manually materialize a view. I think there are two next steps we can take here: First, do you mind sharing details about your original read-only view (with sensitive details redacted)? If there is an optimization we are missing for our non-materialized views that makes them unacceptably slow, we could also try pursuing a fix for that in a separate ticket to make the feature more broadly usable. Second, perhaps there is a way to work around the behavior with existing features in your current MongoDB version. If it would be acceptable for your use case to use aggregations to read from the collection, you could use a combination of views and the collMod command:
Does that make sense? Note that all reads from the view would be aggregations. Let me know if that helps, or if something was unclear. | |||||||
| Comment by Mark [X] [ 29/Aug/18 ] | |||||||
|
Hey Kyle, thanks for the quick reply! If this stops the exceptions it will be great and we could "work with it" - maybe - however it could be better. We have a set of data and a view on it, however the view is quite complicated and its performance is unacceptable: for most queries it takes the same amount to fetch all documents as it is to fetch one, thus giving us >5 seconds per fetch for queries that should take ~1ms if they are executed on the base collection instead of the view. What we are trying to achieve is collection-like performance while having something that is more similar to a view. Our data is being modified in mostly predetermined times so we decided to aggregate over the data to build a collection that looks exactly like the view we had. This gave us excellent performance.
So the process we look for is "replacing" the collection with the results of the aggregation query, exactly what $out does. If we only replace using the _id as you have mentioned we still have to delete some of documents because some of the changes are removing documents, thus having a short while when the collection is in an inaccurate state (having documents that shouldn't be there). Effectively our process uses $out because it seemed "transactional" according to the documentation - every query on the $out collection would either refer to the old set of documents or the new set. Any intermediate steps are causing accuracy issues in our system, which is unacceptable.
We could work with your solution by introducing fake "empty" documents to mimic a deletion action within the same aggregation transaction, or you could introduce a "deleteDocuments" mechanism. However the best solution for us is if mimic the $out operator behavior and just getting rid of the exception.
Again, thank you for the attention! | |||||||
| Comment by Kyle Suarez [ 29/Aug/18 ] | |||||||
|
Hi Segal, I sympathize with you as I realize that robust retry logic can be difficult to build in an application. However, in the upcoming MongoDB 4.2 release, we are working on many improvements to the $out aggregation stage, and I'd like to see if these new features would help satisfy your use case. Presently, in MongoDB 4.0, the $out stage will perform a destructive database catalog operation on completion: it will drop the target collection, which kills all active cursors on that collection, and then replaces it with a new collection via the rename command. In MongoDB 4.2, we have completed Taking your example and writing it in MongoDB Shell JavaScript, you could use the new $out like so:
The mode option tells $out how to perform the writes. In "replaceDocuments" mode, $out will insert documents into the target collection. If that target collection already contains a document with the same uniqueKey (in this example, the same _id), then $out will replace that document. You can see another example written by my colleague here. Would this new feature satisfy your use case? If you'd like, you can try this out yourself by visiting our download center and downloading a development release. As a heads up, this feature is still under active development, so there are still bugs to be fixed and functionality to implement. Regards, |