[SERVER-21935] SpiderMonkey shell takes 20+ minutes to run ~10MB JS test Created: 17/Dec/15  Updated: 06/Dec/22  Resolved: 03/Dec/21

Status: Closed
Project: Core Server
Component/s: JavaScript, Shell
Affects Version/s: 3.1.7
Fix Version/s: None

Type: Bug Priority: Minor - P4
Reporter: Kamran K. Assignee: Backlog - Server Tooling and Methods (STM) (Inactive)
Resolution: Won't Fix Votes: 0
Labels: fuzzer-blacklist, move-stm
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File 0.0.64-27692afcd0-ent_d7bdd5d639-qa_dac003fe66-1450372090006-094.js    
Issue Links:
Depends
Related
Assigned Teams:
Server Tooling & Methods
Operating System: ALL
Participants:
Linked BF Score: 15

 Description   

The fuzzer created a ~10MB JS test (attached). The 3.0.8 shell takes ~12 seconds to run the test. The 3.2.0 shell takes 20+ mins and causes Evergreen tasks to time out.

Both runs used a mongod from master.

Some perf output:

Samples: 795K of event 'cpu-clock', Event count (approx.): 198826000000                  
 39.40%  mongo  mongo                [.] (anonymous namespace)::BytecodeParser::parse()
 32.32%  mongo  mongo                [.] JSScript::mainOffset() const
 13.02%  mongo  mongo                [.] JS::Rooted<JSScript*>::operator->() const
  7.28%  mongo  mongo                [.] JS::Rooted<JSScript*>::get() const



 Comments   
Comment by Brooke Miller [ 03/Dec/21 ]

STM doesn't have bandwidth to pick this up, so we're closing as won't fix. However, if other teams have capacity and would like to pursue this, please feel free to pick this up.

Comment by Mira Carey [ 04/May/18 ]

Re-opening to re-investigate

Comment by Eric Milkie [ 23/Jun/17 ]

Closing as Won't Fix, or Gone Away? Does this problem no longer cause Evergreen task timeouts? Is the test simply gone? Did we end up increasing the timeouts?

Comment by Eric Milkie [ 27/Jun/16 ]

Is SpiderMonkey throwing a C++ exception to handle the error casee, where V8 is not? Throwing and catching exceptions can really slow things down.

Comment by Max Hirschhorn [ 26/Jun/16 ]

This ticket hasn't really gotten any attention and I think it's worth identifying what the underlying issue is. I collected some perf samples and found that js::ReportIsNotFunction() is where we are spending a lot of our time in js::Invoke().

96.00%    mongo  libmozjs.so                  [.] (anonymous namespace)::BytecodeParser::parse()
          |
          --- (anonymous namespace)::BytecodeParser::parse()
             |
              --96.00%-- DecompileExpressionFromStack(JSContext*, int, int, JS::Handle<JS::Value>, char**)
                        js::DecompileValueGenerator(JSContext*, int, JS::Handle<JS::Value>, JS::Handle<JSString*>, int)
                        |
                        |--90.18%-- js::ReportValueErrorFlags(JSContext*, unsigned int, unsigned int, int, JS::Handle<JS::Value>, JS::Handle<JSString*>, char const*, c
                        |          |
                        |           --89.77%-- js::ReportIsNotFunction(JSContext*, JS::Handle<JS::Value>, int, js::MaybeConstruct)
                        |                     js::Invoke(JSContext*, JS::CallArgs const&, js::MaybeConstruct)
                        |                     Interpret(JSContext*, js::RunState&)
                        |                     |
                        |                      --89.43%-- js::RunScript(JSContext*, js::RunState&)
                        |                                js::ExecuteKernel(JSContext*, JS::Handle<JSScript*>, JSObject&, JS::Value const&, js::ExecuteType, js::Abstrac
                        |                                js::Execute(JSContext*, JS::Handle<JSScript*>, JSObject&, JS::Value*)
                        |                                ExecuteScript(JSContext*, JS::Handle<JSObject*>, JS::Handle<JSScript*>, JS::Value*)
                        |                                JS_ExecuteScript(JSContext*, JS::Handle<JSScript*>, JS::MutableHandle<JS::Value>)
                        |                                mongo::mozjs::MozJSImplScope::exec(mongo::StringData, std::__cxx11::basic_string<char, std::char_traits<char>,
                        |                                std::_Function_handler<void (), mongo::mozjs::MozJSProxyScope::exec(mongo::StringData, std::__cxx11::basic_str
                        |                                mongo::mozjs::MozJSProxyScope::implThread(void*)
                        |                                nspr::Thread::ThreadRoutine(void*)
                        |                                execute_native_thread_routine

I'm not familiar enough with SpiderMonkey to speak to the expense of that function, but it was easy enough to determine the sheer number of times we are calling it by setting a breakpoint in gdb. The 0.0.64-27692afcd0-ent_d7bdd5d639-qa_dac003fe66-1450372090006-094.js test attached to this ticket attempts to call an insert_doc() function 6000 times (of the ~13500 top-level statements). The identifier insert_doc is either not defined or isn't a function throughout the entire test. Applying the following changes to the test to define insert_doc as a no-op function makes it so that the test finishes in around 4 minutes (down from 20 or so minutes).

diff --git a/0.0.64-27692afcd0-ent_d7bdd5d639-qa_dac003fe66-1450372090006-094.js b/0.0.64-27692afcd0-ent_d7bdd5d639-qa_dac003fe66-1450372090006-094.js
index 9a467b0..7cffc57 100644
--- a/0.0.64-27692afcd0-ent_d7bdd5d639-qa_dac003fe66-1450372090006-094.js
+++ b/0.0.64-27692afcd0-ent_d7bdd5d639-qa_dac003fe66-1450372090006-094.js
@@ -400,6 +400,7 @@ var randomTable = [0.8497028675465178,0.4927160027118276,0.3886717403044413,0.88
 // End of preamble.
 
 /* eslint-enable */
+var insert_doc = Function.prototype;
 
 var _______________________________________________________________________________;
 _______________________________________________________________________________;
@@ -10512,6 +10513,7 @@ try {
 }
 var $endTime = Date.now();
 print('Top-level statement 750 completed in', $endTime - $startTime, 'ms');
+insert_doc = Function.prototype;
 _______________________________________________________________________________;
 _______________________________________________________________________________;
 _______________________________________________________________________________;
@@ -37517,6 +37519,7 @@ try {
 }
 var $endTime = Date.now();
 print('Top-level statement 2765 completed in', $endTime - $startTime, 'ms');
+insert_doc = Function.prototype;
 _______________________________________________________________________________;
 _______________________________________________________________________________;
 _______________________________________________________________________________;

Gathering perf samples with the aforementioned changes to the test reveals that we also spend a significant amount of time in ReportIsNullOrUndefined() inside the js::ToObjectSlow() case of js::GetProperty(). I didn't investigate what statements within the tests are causing this to happen, but trying to access properties on null and undefined values seems exactly like what the fuzzer could be doing.

85.35%    mongo  libmozjs.so                  [.] (anonymous namespace)::BytecodeParser::parse()
          |
          --- (anonymous namespace)::BytecodeParser::parse()
              DecompileExpressionFromStack(JSContext*, int, int, JS::Handle<JS::Value>, char**)
              js::DecompileValueGenerator(JSContext*, int, JS::Handle<JS::Value>, JS::Handle<JSString*>, int)
             |
             |--62.46%-- js::ReportIsNullOrUndefined(JSContext*, int, JS::Handle<JS::Value>, JS::Handle<JSString*>)
             |          js::ToObjectSlow(JSContext*, JS::Handle<JS::Value>, bool)
             |          |
             |          |--59.89%-- js::GetProperty(JSContext*, JS::Handle<JS::Value>, JS::Handle<js::PropertyName*>, JS::MutableHandle<JS::Value>)
             |          |          Interpret(JSContext*, js::RunState&)
             |          |          |
             |          |           --59.14%-- js::RunScript(JSContext*, js::RunState&)
             |          |                     js::ExecuteKernel(JSContext*, JS::Handle<JSScript*>, JSObject&, JS::Value const&, js::ExecuteType, js::AbstractFramePtr,
             |          |                     js::Execute(JSContext*, JS::Handle<JSScript*>, JSObject&, JS::Value*)
             |          |                     ExecuteScript(JSContext*, JS::Handle<JSObject*>, JS::Handle<JSScript*>, JS::Value*)
             |          |                     JS_ExecuteScript(JSContext*, JS::Handle<JSScript*>, JS::MutableHandle<JS::Value>)
             |          |                     mongo::mozjs::MozJSImplScope::exec(mongo::StringData, std::__cxx11::basic_string<char, std::char_traits<char>, std::alloc
             |          |                     std::_Function_handler<void (), mongo::mozjs::MozJSProxyScope::exec(mongo::StringData, std::__cxx11::basic_string<char, s
             |          |                     mongo::mozjs::MozJSProxyScope::implThread(void*)
             |          |                     nspr::Thread::ThreadRoutine(void*)
             |          |                     execute_native_thread_routine

I'm still puzzled by the gap between V8's and SpiderMonkey's performance when running the test. Is it really the case that V8 has simply optimized reporting these error cases?

Comment by Kamran K. [ 17/Dec/15 ]

The failure: https://evergreen.mongodb.com/task/mongodb_mongo_v3.2_enterprise_rhel_57_64_bit_jstestfuzz_replication_WT_27692afcd08165ca8087bff7dd3fae754acfcb84_15_12_16_22_16_18

Generated at Thu Feb 08 03:58:51 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.