[SERVER-24791] Test a views passthrough suite with jscore Created: 24/Jun/16  Updated: 05/Apr/17  Resolved: 18/Nov/16

Status: Closed
Project: Core Server
Component/s: Testing Infrastructure
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Major - P3
Reporter: Kyle Suarez Assignee: Kyle Suarez
Resolution: Won't Fix Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Backwards Compatibility: Major Change
Sprint: Query 2016-11-21
Participants:

 Description   

We should write a test suite that runs jscore where collection read operations are replaced by reads on an identity view on top of the underlying collection.

The only failures should be accepted errata between query results and aggregation results (mostly with regard to operators allowed in query but prohibited in aggregation).



 Comments   
Comment by Kyle Suarez [ 08/Nov/16 ]

Here's my latest attempt at a passthrough:

diff --git a/buildscripts/resmokeconfig/suites/views_query_passthrough.yml b/buildscripts/resmokeconfig/suites/views_query_passthrough.yml
new file mode 100644
index 0000000..ba7636d
--- /dev/null
+++ b/buildscripts/resmokeconfig/suites/views_query_passthrough.yml
@@ -0,0 +1,20 @@
+selector:
+  js_test:
+    roots:
+    - jstests/core/*.js
+
+executor:
+  js_test:
+    config:
+      shell_options:
+        eval: "load('jstests/libs/override_methods/implicitly_run_queries_on_views.js');"
+        readMode: commands
+    hooks:
+    - class: ValidateCollections
+    - class: CleanEveryN
+      n: 20
+    fixture:
+      class: MongoDFixture
+      mongod_options:
+        set_parameters:
+          enableTestCommands: 1
diff --git a/etc/evergreen.yml b/etc/evergreen.yml
index 8d25280..5315ca9 100644
--- a/etc/evergreen.yml
+++ b/etc/evergreen.yml
@@ -2602,6 +2602,24 @@ tasks:
       resmoke_args: --suites=views_rs --storageEngine=wiredTiger
       run_multiple_jobs: true
 
+- <<: *task_template
+  name: views_query_passthrough
+  commands:
+  - func: "do setup"
+  - func: "run tests"
+    vars:
+      resmoke_args: --suites=views_query_passthrough --storageEngine=mmapv1
+      run_multiple_jobs: true
+
+- <<: *task_template
+  name: views_query_passthrough_WT
+  commands:
+  - func: "do setup"
+  - func: "run tests"
+    vars:
+      resmoke_args: --suites=views_query_passthrough --storageEngine=wiredTiger
+      run_multiple_jobs: true
+
 - name: push
   patchable: false
   depends_on:
@@ -4749,6 +4767,8 @@ buildvariants:
   - name: views_WT
   - name: views_rs
   - name: views_rs_WT
+  - name: views_query_passthrough
+  - name: views_query_passthrough_WT
 
 - name: enterprise-windows-64
   display_name: "* Enterprise Windows"
@@ -5515,6 +5535,8 @@ buildvariants:
   - name: views_WT
   - name: views_rs
   - name: views_rs_WT
+  - name: views_query_passthrough
+  - name: views_query_passthrough_WT
   - name: push
 
 - name: enterprise-rhel-62-64-bit-coverage
@@ -8026,6 +8048,8 @@ buildvariants:
   - name: views
   - name: views_WT
   - name: views_rs_WT
+  - name: views_query_passthrough
+  - name: views_query_passthrough_WT
 
 - name: ubuntu1604-asan
   display_name: ~ ASAN SSL Ubuntu 16.04
@@ -8198,7 +8222,10 @@ buildvariants:
   - name: unittests
   - name: views
   - name: views_WT
+  - name: views_rs
   - name: views_rs_WT
+  - name: views_query_passthrough
+  - name: views_query_passthrough_WT
 
 - name: enterprise-ubuntu-dynamic-1604-64-bit
   display_name: "* Shared Library Enterprise Ubuntu 16.04"
diff --git a/jstests/libs/override_methods/implicitly_run_queries_on_views.js b/jstests/libs/override_methods/implicitly_run_queries_on_views.js
new file mode 100644
index 0000000..8f06459
--- /dev/null
+++ b/jstests/libs/override_methods/implicitly_run_queries_on_views.js
@@ -0,0 +1,64 @@
+/**
+ * Loading this file overrides DBCollection.prototype.runCommand() with a function that replaces a
+ * find, count, distinct or aggregate on a collection with the same operation on an identity view
+ * that is built on the original namespace.
+ */
+(function() {
+    "use strict";
+
+    // Save a reference to the original query methods in the IIFE's scope. This scoping allows the
+    // original method to be called by the overrides below.
+    var originalRunCommand = Mongo.prototype.runCommand;
+
+    function isQueryCommand(cmdObj) {
+        let queryCommands = ["aggregate", "find", "count", "distinct"];
+        for (let command of queryCommands) {
+            if (cmdObj.hasOwnProperty(command)) {
+                return command;
+            }
+        }
+        return false;
+    }
+
+    Mongo.prototype.runCommand = function(dbName, cmdObj, options) {
+        // Check to see if this is a command we want to intercept.
+        let command = "";
+        if (typeof cmdObj !== "object" || cmdObj === null ||
+            (command = isQueryCommand(cmdObj)) === false) {
+            print("Not applying view transformation to invalid or non-query command");
+            return originalRunCommand.apply(this, arguments);
+        }
+
+        // Don't create a view on system collections or identity views that we ourselves have
+        // created.
+        const collName = cmdObj[command];
+        if (collName === "oplog.rs" || collName.startsWith("system")) {
+            print("Not applying view transformation to system collection " + collName);
+            return originalRunCommand.apply(this, arguments);
+        }
+        if (collName.endsWith("_identity_view")) {
+            print("Not applying view transformation to existing identity view " + collName);
+            return originalRunCommand.apply(this, arguments);
+        }
+
+        // We've certified that this is probably a "regular" collection or view, so we create an
+        // identity view (that is, a no-op view) on top of it.
+        const viewName = collName + "_identity_view";
+        const createViewCmd = {create: viewName, viewOn: collName};
+        const createViewOptions = 0;
+        originalRunCommand.apply(this, [dbName, createViewCmd, 0]);
+
+        // Run the command against the view. If the command indicates that the operation is not
+        // supported on a view or in aggregation, log it and just run the original command.
+        cmdObj[command] = viewName;
+        let res = originalRunCommand.apply(this, arguments);
+        if (res.ok !== 1 && (res.code === ErrorCodes.InvalidPipelineOperator ||
+                             res.code === ErrorCodes.CommandNotSupportedOnView ||
+                             res.code === ErrorCodes.OptionNotSupportedOnView)) {
+            print("Caught views error " + tojson(res) + "; rerunning original command");
+            cmdObj[command] = collName;
+            return originalRunCommand.apply(this, arguments);
+        }
+        return res;
+    };
+}());

From the results of my patch build, I've observed the following failures when implicitly running queries on views:

  1. Unexpected entries in Top, system.profile, or listCollections due to creating extra views
  2. $where not allowed on a view (not supported in a $match stage)
  3. Use of geo operators (e.g. $near, $nearSphere) on a view (not supported in a $match stage)
  4. Unexpected array sort order
  5. Plan cache discrepancies
  6. sortKey not supported in $meta
  7. Text queries unsupported on a view (specifically, it fails because the view does not have a text index)
  8. $natural sort order not supported on a view (which fails with the error message "FieldPath field names may not start with '$'")

In my opinion, all of these are acceptable, though I'd like david.storch and max.hirschhorn to confirm. If there's anything actionable here, it might be the last two items, as I think the error messages are rather unhelpful and might be made to be views-specific, if possible.

One major problem with my patch is that it produced a hang in my patch build, which I can reproduce locally. I've done some analysis, but I can't figure out what's going on. There are no tests which started but did not finish, and the hang analyzer has the following output:

[2016/11/01 20:12:05.648] Thread 14 (Thread 0x7fab6cf4c700 (LWP 1709)):
[2016/11/01 20:12:05.648] #0  0x00007fab72f8a585 in sigwait () from /lib64/libpthread.so.0
[2016/11/01 20:12:05.648] #1  0x00007fab77053512 in mongo::(anonymous namespace)::signalProcessingThread() ()
[2016/11/01 20:12:05.648] #2  0x00007fab77ac5840 in execute_native_thread_routine ()
[2016/11/01 20:12:05.649] #3  0x00007fab72f82aa1 in start_thread () from /lib64/libpthread.so.0
[2016/11/01 20:12:05.650] #4  0x00007fab72ccfaad in clone () from /lib64/libc.so.6
[2016/11/01 20:12:05.650] Thread 13 (Thread 0x7fab6c54b700 (LWP 1710)):
[2016/11/01 20:12:05.650] #0  0x00007fab72f86a5e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
[2016/11/01 20:12:05.650] #1  0x00007fab76fc6bc5 in std::thread::_Impl<std::_Bind_simple<mongo::BackgroundThreadClockSource::_startTimerThread()::{lambda()#1} ()> >::_M_run() ()
[2016/11/01 20:12:05.650] #2  0x00007fab77ac5840 in execute_native_thread_routine ()
[2016/11/01 20:12:05.650] #3  0x00007fab72f82aa1 in start_thread () from /lib64/libpthread.so.0
[2016/11/01 20:12:05.651] #4  0x00007fab72ccfaad in clone () from /lib64/libc.so.6
[2016/11/01 20:12:05.651] Thread 12 (Thread 0x7fab6bb4a700 (LWP 1711)):
[2016/11/01 20:12:05.651] #0  0x00007fab72f8668c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
[2016/11/01 20:12:05.651] #1  0x00007fab77ac2c4c in std::condition_variable::wait(std::unique_lock<std::mutex>&) ()
[2016/11/01 20:12:05.651] #2  0x00007fab76cf2b21 in mongo::FileAllocator::run(mongo::FileAllocator*) ()
[2016/11/01 20:12:05.651] #3  0x00007fab77ac5840 in execute_native_thread_routine ()
[2016/11/01 20:12:05.652] #4  0x00007fab72f82aa1 in start_thread () from /lib64/libpthread.so.0
[2016/11/01 20:12:05.652] #5  0x00007fab72ccfaad in clone () from /lib64/libc.so.6
[2016/11/01 20:12:05.652] Thread 11 (Thread 0x7fab6b149700 (LWP 1712)):
[2016/11/01 20:12:05.652] #0  0x00007fab72f8a00d in nanosleep () from /lib64/libpthread.so.0
[2016/11/01 20:12:05.652] #1  0x00007fab770644b3 in mongo::sleepmillis(long long) ()
[2016/11/01 20:12:05.653] #2  0x00007fab76cbc454 in mongo::DataFileSync::run() ()
[2016/11/01 20:12:05.653] #3  0x00007fab76fc49fd in mongo::BackgroundJob::jobBody() ()
[2016/11/01 20:12:05.653] #4  0x00007fab77ac5840 in execute_native_thread_routine ()
[2016/11/01 20:12:05.653] #5  0x00007fab72f82aa1 in start_thread () from /lib64/libpthread.so.0
[2016/11/01 20:12:05.653] #6  0x00007fab72ccfaad in clone () from /lib64/libc.so.6
[2016/11/01 20:12:05.654] Thread 10 (Thread 0x7fab6a748700 (LWP 1713)):
[2016/11/01 20:12:05.654] #0  0x00007fab72f86a5e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
[2016/11/01 20:12:05.654] #1  0x00007fab76cc255e in mongo::dur::durThread(mongo::ClockSource*, long) ()
[2016/11/01 20:12:05.654] #2  0x00007fab77ac5840 in execute_native_thread_routine ()
[2016/11/01 20:12:05.654] #3  0x00007fab72f82aa1 in start_thread () from /lib64/libpthread.so.0
[2016/11/01 20:12:05.654] #4  0x00007fab72ccfaad in clone () from /lib64/libc.so.6
[2016/11/01 20:12:05.655] Thread 9 (Thread 0x7fab69d47700 (LWP 1714)):
[2016/11/01 20:12:05.655] #0  0x00007fab72f8668c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
[2016/11/01 20:12:05.655] #1  0x00007fab77ac2c4c in std::condition_variable::wait(std::unique_lock<std::mutex>&) ()
[2016/11/01 20:12:05.655] #2  0x00007fab76cd3bbb in mongo::dur::JournalWriter::_journalWriterThread() ()
[2016/11/01 20:12:05.655] #3  0x00007fab77ac5840 in execute_native_thread_routine ()
[2016/11/01 20:12:05.655] #4  0x00007fab72f82aa1 in start_thread () from /lib64/libpthread.so.0
[2016/11/01 20:12:05.656] #5  0x00007fab72ccfaad in clone () from /lib64/libc.so.6
[2016/11/01 20:12:05.656] Thread 8 (Thread 0x7fab68945700 (LWP 1716)):
[2016/11/01 20:12:05.656] #0  0x00007fab72f8668c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
[2016/11/01 20:12:05.656] #1  0x00007fab77ac2c4c in std::condition_variable::wait(std::unique_lock<std::mutex>&) ()
[2016/11/01 20:12:05.656] #2  0x00007fab76f63760 in mongo::DeadlineMonitor<mongo::mozjs::MozJSImplScope>::deadlineMonitorThread() ()
[2016/11/01 20:12:05.657] #3  0x00007fab77ac5840 in execute_native_thread_routine ()
[2016/11/01 20:12:05.657] #4  0x00007fab72f82aa1 in start_thread () from /lib64/libpthread.so.0
[2016/11/01 20:12:05.657] #5  0x00007fab72ccfaad in clone () from /lib64/libc.so.6
[2016/11/01 20:12:05.657] Thread 7 (Thread 0x7fab69346700 (LWP 1724)):
[2016/11/01 20:12:05.657] #0  0x00007fab72f86a5e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
[2016/11/01 20:12:05.657] #1  0x00007fab7661639e in mongo::FTDCController::doLoop() ()
[2016/11/01 20:12:05.658] #2  0x00007fab77ac5840 in execute_native_thread_routine ()
[2016/11/01 20:12:05.658] #3  0x00007fab72f82aa1 in start_thread () from /lib64/libpthread.so.0
[2016/11/01 20:12:05.658] #4  0x00007fab72ccfaad in clone () from /lib64/libc.so.6
[2016/11/01 20:12:05.658] Thread 6 (Thread 0x7fab5df44700 (LWP 1725)):
[2016/11/01 20:12:05.658] #0  0x00007fab72f86a5e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
[2016/11/01 20:12:05.659] #1  0x00007fab769d5a9c in mongo::RangeDeleter::doWork() ()
[2016/11/01 20:12:05.659] #2  0x00007fab77ac5840 in execute_native_thread_routine ()
[2016/11/01 20:12:05.659] #3  0x00007fab72f82aa1 in start_thread () from /lib64/libpthread.so.0
[2016/11/01 20:12:05.659] #4  0x00007fab72ccfaad in clone () from /lib64/libc.so.6
[2016/11/01 20:12:05.659] Thread 5 (Thread 0x7fab5d543700 (LWP 1726)):
[2016/11/01 20:12:05.659] #0  0x00007fab72f8a00d in nanosleep () from /lib64/libpthread.so.0
[2016/11/01 20:12:05.660] #1  0x00007fab770643eb in mongo::sleepsecs(int) ()
[2016/11/01 20:12:05.660] #2  0x00007fab76d76378 in mongo::TTLMonitor::run() ()
[2016/11/01 20:12:05.660] #3  0x00007fab76fc49fd in mongo::BackgroundJob::jobBody() ()
[2016/11/01 20:12:05.660] #4  0x00007fab77ac5840 in execute_native_thread_routine ()
[2016/11/01 20:12:05.660] #5  0x00007fab72f82aa1 in start_thread () from /lib64/libpthread.so.0
[2016/11/01 20:12:05.660] #6  0x00007fab72ccfaad in clone () from /lib64/libc.so.6
[2016/11/01 20:12:05.661] Thread 4 (Thread 0x7fab52b42700 (LWP 1734)):
[2016/11/01 20:12:05.661] #0  0x00007fab72f8a00d in nanosleep () from /lib64/libpthread.so.0
[2016/11/01 20:12:05.661] #1  0x00007fab770643eb in mongo::sleepsecs(int) ()
[2016/11/01 20:12:05.661] #2  0x00007fab76455d1a in mongo::ClientCursorMonitor::run() ()
[2016/11/01 20:12:05.661] #3  0x00007fab76fc49fd in mongo::BackgroundJob::jobBody() ()
[2016/11/01 20:12:05.661] #4  0x00007fab77ac5840 in execute_native_thread_routine ()
[2016/11/01 20:12:05.662] #5  0x00007fab72f82aa1 in start_thread () from /lib64/libpthread.so.0
[2016/11/01 20:12:05.662] #6  0x00007fab72ccfaad in clone () from /lib64/libc.so.6
[2016/11/01 20:12:05.662] Thread 3 (Thread 0x7fab52141700 (LWP 1735)):
[2016/11/01 20:12:05.662] #0  0x00007fab72f86a5e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
[2016/11/01 20:12:05.662] #1  0x00007fab76fc5b77 in mongo::(anonymous namespace)::PeriodicTaskRunner::run() ()
[2016/11/01 20:12:05.663] #2  0x00007fab76fc49fd in mongo::BackgroundJob::jobBody() ()
[2016/11/01 20:12:05.663] #3  0x00007fab77ac5840 in execute_native_thread_routine ()
[2016/11/01 20:12:05.663] #4  0x00007fab72f82aa1 in start_thread () from /lib64/libpthread.so.0
[2016/11/01 20:12:05.663] #5  0x00007fab72ccfaad in clone () from /lib64/libc.so.6
[2016/11/01 20:12:05.663] Thread 2 (Thread 0x7fab51740700 (LWP 1736)):
[2016/11/01 20:12:05.663] #0  0x00007fab72cc8523 in select () from /lib64/libc.so.6
[2016/11/01 20:12:05.664] #1  0x00007fab76ff49fc in mongo::Listener::initAndListen() ()
[2016/11/01 20:12:05.664] #2  0x00007fab77ac5840 in execute_native_thread_routine ()
[2016/11/01 20:12:05.664] #3  0x00007fab72f82aa1 in start_thread () from /lib64/libpthread.so.0
[2016/11/01 20:12:05.664] #4  0x00007fab72ccfaad in clone () from /lib64/libc.so.6
[2016/11/01 20:12:05.664] Thread 1 (Thread 0x7fab759b1d60 (LWP 1702)):
[2016/11/01 20:12:05.664] #0  0x00007fab72f8668c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
[2016/11/01 20:12:05.665] #1  0x00007fab77ac2c4c in std::condition_variable::wait(std::unique_lock<std::mutex>&) ()
[2016/11/01 20:12:05.665] #2  0x00007fab76fd2bbf in mongo::waitForShutdown() ()
[2016/11/01 20:12:05.665] #3  0x00007fab762219d0 in mongo::(anonymous namespace)::_initAndListen(int) ()
[2016/11/01 20:12:05.665] #4  0x00007fab7623f924 in main ()

Nothing in this hang analyzer output suggests to me that something in mongod is stuck in deadlock, so perhaps my DB.prototype.runCommand() is doing something illegal? In the interest of not spending too much time on something that we don't plan to commit, I've set this aside for other work. I can continue investigating/fixing if either of you decide it's worth it.

Comment by Kyle Suarez [ 06/Sep/16 ]

I'm going on vacation, so I'm throwing this back on the backlog. An initial commit with my work can be found at https://github.com/ksuarz/mongo/tree/views-passthrough.

The work-in-progress code contains overrides for the query methods but doesn't handle error cases. I would suggest that the next option to try is overriding Mongo.prototype.runCommand, checking for a views-related error, and then handling that somehow to run the original query unmodified.

Comment by Geert Bosch [ 06/Sep/16 ]

A JavaScript based solution seems preferable indeed, even if it was just so it prevents adding test-only code to the server.

Comment by Kyle Suarez [ 06/Sep/16 ]

I have a few high-level ideas as to how to approach this:

  1. Add an override to jstests/libs/override_methods that overrides DBCollection.prototype.{find, aggregate, count, distinct}. This would create an identity view on the target namespace and return a DBQuery on the view instead, with a hidden member variable that indicates the original namespace. Then, we override appropriate methods on DBQuery that actually run the command to check for failures due to limitations in the aggregation system. If the command fails with error codes like InvalidPipelineOperator or OptionNotSupportedOnView, we log the error but continue by changing the query to its original, non-view version and running that instead.
  2. We add a test-only setParameter to the server that automatically translates finds, counts and distincts into aggregations. We would still need to implement JavaScript overrides that handles the command failing for one of the above failure codes.

Option 1, the purely JavaScript implementation, seems like the better approach if it is possible, though I am not sure yet if that plan is a feasible one. I haven't yet gotten a POC working that handles failure cases elegantly.

CC geert.bosch

Generated at Thu Feb 08 04:07:26 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.