[SERVER-37283] View graph cycle on expressive lookup secondary read Created: 24/Sep/18 Updated: 29/Oct/23 Resolved: 11/Feb/19 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Aggregation Framework |
| Affects Version/s: | 4.0.0 |
| Fix Version/s: | 4.1.8 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | James Wahlin | Assignee: | Charlie Swanson |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Operating System: | ALL | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Backport Requested: |
v4.0
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Steps To Reproduce: | The following patch is based on this commit and will make this issue easier to reproduce:
Run resmoke with the following arguments (this may take a few runs to trigger):
The following patch includes an additional change which will trigger an invariant if resolveInvolvedNamespaces() encounters a invalid ViewCatalog mid-resolution. This confirms that the MODE_IS lock is not protecting the ViewCatalog from change:
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Sprint: | Query 2018-12-17, Query 2018-12-31, Query 2019-01-14, Query 2019-01-28, Query 2019-02-11, Query 2019-02-25 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Participants: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Linked BF Score: | 58 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Description |
|
It is possible for the view_catalog_cycle_lookup.js FSM test to fail with MaxSubPipelineDepthExceeded when run in a suite that performs secondary reads. This is caused when the aggregate command generates an invalid pipeline containing a view cycle and can happen when the view catalog changes while resolving pipeline namespaces. When the aggregate command calls resolveInvolvedNamespaces() it holds a MODE_IS database lock which is meant to protect against ViewCatalog change, as collmod & collection drop/create require a database MODE_X lock. On secondaries however view catalog changes are replicated as inserts to the system.views collection and obtain a MODE_IX lock rather than MODE_X. This allows for view definition change mid-resolution and can result in an invalid view graph, one that may contain a cycle. |
| Comments |
| Comment by Githook User [ 11/Feb/19 ] |
|
Author: {'name': 'Charlie Swanson', 'email': 'charlie.swanson@mongodb.com', 'username': 'cswanson310'}Message: Readers of the view catalog depend on a MODE_IS DB lock preventing |
| Comment by Charlie Swanson [ 11/Dec/18 ] |
|
This got quite complicated quite quickly. Pausing on this for now in favor of other work. |
| Comment by Martin Neupauer [ 09/Nov/18 ] |
|
The issue lays in resolveInvolvedNamespaces in run_aggregate.cpp. There is a while loop (https://github.com/mongodb/mongo/blob/1c2b3f3ad137758d6cc6275a61841b0836095d6b/src/mongo/db/commands/run_aggregate.cpp#L226) that repeatedly calls viewCatalog->resolveView and viewCatalog->lookup. Another approach is to make sure we take the same database/collection locks on secondaries as we do primaries as this problem does not exist on primaries. |
| Comment by Ian Whalen (Inactive) [ 06/Nov/18 ] |
|
martin.neupauer have you come up with any additional info you can add here in the past few BF Fridays? Ticket has been in investigating with no update for ~1 month so I'd like to get it moving forward. |
| Comment by David Storch [ 04/Oct/18 ] |
|
@martin.neupauer to use BF Friday to investigate how to fix this. |
| Comment by James Wahlin [ 25/Sep/18 ] |
|
Updated. The only released version this affects is 4.0. Prior to |
| Comment by David Storch [ 25/Sep/18 ] |
|
james.wahlin, can you fill out the "affects versions" field? It sounds like this issue has been present since views were first implemented in 3.4? |