[SERVER-4191] killOp cannot interrupt v8 javascript loop Created: 02/Nov/11  Updated: 11/Jul/16  Resolved: 11/Nov/11

Status: Closed
Project: Core Server
Component/s: JavaScript
Affects Version/s: None
Fix Version/s: 2.1.0

Type: Bug Priority: Major - P3
Reporter: Greg Studer Assignee: Antoine Girbal
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Ubuntu 10.10 x86_64, libv8-2.2.18


Issue Links:
Related
related to SERVER-3481 interrupting --eval while(true){ ; } ... Closed
Operating System: ALL
Participants:

 Description   

On my machine, killop.js does not end queries with a "while( 1 )

{ ; }

" where clause - the shell hangs for a long period of time. Seems like the root of the problem is the v8::Locker l; mutex in V8ScriptEngine::interrupt(). As far as I can tell, this never yields when there's other javascript executing.

If this is expected behavior (or difficult to fix), the test case should probably be modified - adding a sleep( 1 ) to the loop allows the operation to be killed.

Observed this generally through many versions of mongodb.



 Comments   
Comment by auto [ 11/Nov/11 ]

Author:

{u'login': u'agirbal', u'name': u'agirbal', u'email': u'antoine@10gen.com'}

Message: SERVER-4191: cleanup
Branch: master
https://github.com/mongodb/mongo/commit/a59266c47fc921480789070fa76448159f7cdda3

Comment by auto [ 11/Nov/11 ]

Author:

{u'login': u'agirbal', u'name': u'agirbal', u'email': u'antoine@10gen.com'}

Message: SERVER-4191: fixed v8 interruption by using Isolates
Branch: master
https://github.com/mongodb/mongo/commit/49c58aa9baccde3ac82ed82f9c5c29286d2a2889

Comment by Antoine Girbal [ 10/Nov/11 ]

I switch the v8 code to use isolates to try out.
This seems to fix this issue entirely, since we can terminate a given isolate instead of relying on getting the right thread id.
Using isolate would also fix the other issue where an out of memory affects the whole engine.

Comment by Antoine Girbal [ 10/Nov/11 ]

This was the problem in question, raised by aaron a while back
http://code.google.com/p/v8/issues/detail?id=708
I pinged the v8 dev about it.
Otherwise will try to think of workaround.

Comment by Antoine Girbal [ 10/Nov/11 ]

actually looks like the issue is somewhere else.
There is a mixup between operation id and thread id.
There was a comment about it not being reliable in our code..

Comment by Antoine Girbal [ 10/Nov/11 ]

Issue may be due to a "bug" in v8.
I verified that we properly call terminateExecution() on the thread, but it does not stop.
v8 kills a thread by raising a stack overflow.
The problem is that if the code is very simple it may be optimized to not have a stack guard check and it will not terminate.
The same issue is reported here: http://www.mail-archive.com/v8-users@googlegroups.com/msg01756.html
There was no final answer so will ask v8 guys.
Also it sounds like the "crankshaft" optimization in v8 makes it worse.

In the end, this may not be a blocker, because few people will run empty loops.
If you have a bit of code in there, then it gets terminated properly.
But difficult to predict.

Comment by Antoine Girbal [ 09/Nov/11 ]

ok thanks just tried again on buildbot and it does fail frequently, so I am able to test.

Comment by Greg Studer [ 09/Nov/11 ]

Actually the first command seems to terminate after a few secs, the second does not.

Comment by Greg Studer [ 09/Nov/11 ]

Attached a sample run on my machine - the first command terminates after two minutes or so, the second never terminates after 7-8 mins so I ctrl-c'd. When cleaning up, the running operation still seemed to hang the mongo instance, another ctrl-c was necessary. Still seems like a timing issue, though the preemption did allow the first command to terminate. Tested twice more, the first command didn't terminate and I gave up.

running /home/greg/Workspaces/Mongo Workspace/mongo/mongod --port 32000 --dbpath /data/db/sconsTests/ --nopreallocj
Wed Nov  9 13:16:07 [initandlisten] MongoDB starting : pid=22160 port=32000 dbpath=/data/db/sconsTests/ 64-bit host=ubuntu
Wed Nov  9 13:16:07 [initandlisten] 
Wed Nov  9 13:16:07 [initandlisten] ** NOTE: This is a development version (2.1.0-pre-) of MongoDB.
Wed Nov  9 13:16:07 [initandlisten] **       Not recommended for production.
Wed Nov  9 13:16:07 [initandlisten] 
Wed Nov  9 13:16:07 [initandlisten] db version v2.1.0-pre-, pdfile version 4.5
Wed Nov  9 13:16:07 [initandlisten] git version: 5b782c8ced503d78f9783d556f92d86941ee5f91
Wed Nov  9 13:16:07 [initandlisten] build info: Linux ubuntu 2.6.35-27-generic #48-Ubuntu SMP Tue Feb 22 20:25:46 UTC 2011 x86_64 BOOST_LIB_VERSION=1_42
Wed Nov  9 13:16:07 [initandlisten] options: { dbpath: "/data/db/sconsTests/", nopreallocj: true, port: 32000 }
Wed Nov  9 13:16:07 [initandlisten] journal dir=/data/db/sconsTests/journal
Wed Nov  9 13:16:07 [initandlisten] recover : no journal files present, no recovery needed
Wed Nov  9 13:16:07 [initandlisten] waiting for connections on port 32000
Wed Nov  9 13:16:07 [websvr] admin web console waiting for connections on port 33000
Wed Nov  9 13:16:08 [initandlisten] connection accepted from 127.0.0.1:47076 #1 (1 connection now open)
starting test : killop.js 
 *******************************************
         Test : killop.js ...
Wed Nov  9 13:16:08 [conn1] end connection 127.0.0.1:47076 (0 connections now open)
['/home/greg/Workspaces/Mongo Workspace/mongo/mongo', '--port', '32000', '/home/greg/Workspaces/Mongo Workspace/mongo/jstests/killop.js', '--eval', 'TestData = new Object();TestData.testPath = "/home/greg/Workspaces/Mongo Workspace/mongo/jstests/killop.js";TestData.testFile = "killop.js";TestData.testName = "killop";TestData.noJournal = false;TestData.noJournalPrealloc = true;']
MongoDB shell version: 2.1.0-pre-
fork() available!
connecting to: 127.0.0.1:32000/test
Wed Nov  9 13:16:08 [initandlisten] connection accepted from 127.0.0.1:47077 #2 (1 connection now open)
true
Wed Nov  9 13:16:08 [conn2] CMD: drop test.jstests_killop
Wed Nov  9 13:16:08 [FileAllocator] allocating new datafile /data/db/sconsTests/test.ns, filling with zeroes...
Wed Nov  9 13:16:08 [FileAllocator] creating directory /data/db/sconsTests/_tmp
Wed Nov  9 13:16:08 [FileAllocator] done allocating datafile /data/db/sconsTests/test.ns, size: 16MB,  took 0.12 secs
Wed Nov  9 13:16:08 [FileAllocator] allocating new datafile /data/db/sconsTests/test.0, filling with zeroes...
Wed Nov  9 13:16:09 [FileAllocator] done allocating datafile /data/db/sconsTests/test.0, size: 64MB,  took 0.905 secs
Wed Nov  9 13:16:09 [FileAllocator] allocating new datafile /data/db/sconsTests/test.1, filling with zeroes...
Wed Nov  9 13:16:09 [conn2] build index test.jstests_killop { _id: 1 }
Wed Nov  9 13:16:09 [conn2] build index done 0 records 0 secs
Wed Nov  9 13:16:09 [conn2] insert test.jstests_killop 1033ms
shell: started program /home/greg/Workspaces/Mongo Workspace/mongo/mongo --eval db.jstests_killop.count( { $where: function() { while( 1 ) { ; } } } ) 127.0.0.1:32000
shell: started program /home/greg/Workspaces/Mongo Workspace/mongo/mongo --eval db.jstests_killop.count( { $where: function() { while( 1 ) { ; } } } ) 127.0.0.1:32000
sh22173| MongoDB shell version: 2.1.0-pre-
sh22175| MongoDB shell version: 2.1.0-pre-
Wed Nov  9 13:16:09 [initandlisten] connection accepted from 127.0.0.1:47078 #3 (2 connections now open)
Wed Nov  9 13:16:09 [initandlisten] connection accepted from 127.0.0.1:47079 #4 (3 connections now open)
Wed Nov  9 13:16:09 [conn2] going to kill op: op: 8.0
Wed Nov  9 13:16:10 [FileAllocator] done allocating datafile /data/db/sconsTests/test.1, size: 128MB,  took 1.432 secs
Wed Nov  9 13:16:43 [conn3] error in invoke: interrupted
Wed Nov  9 13:16:43 [conn2] going to kill op: op: 7.0
Wed Nov  9 13:16:43 [conn3] Count with ns: test.jstests_killop and query: { $where: function () { while( 1 ) { ; } } } failed with exception: exception: 10072 unknown error in invocation of $where function
Wed Nov  9 13:17:07 [clientcursormon] mem (MB) res:36 virt:902 mapped:80
Wed Nov  9 13:22:07 [clientcursormon] mem (MB) res:36 virt:902 mapped:80
^CWed Nov  9 13:25:32 got kill or ctrl c or hup signal 2 (Interrupt), will terminate after current cmd ends
Wed Nov  9 13:25:32 [initandlisten] connection accepted from 127.0.0.1:58803 #5 (4 connections now open)
Wed Nov  9 13:25:32 [initandlisten] connection accepted from 127.0.0.1:58804 #6 (5 connections now open)
^C0 tests succeeded
1 tests didn't get run
Traceback (most recent call last):
  File "buildscripts/smoke.py", line 531, in <module>
    main()
  File "buildscripts/smoke.py", line 525, in main
    run_tests(tests)
  File "buildscripts/smoke.py", line 346, in run_tests
    check_db_hashes(master, slave)
  File "buildscripts/smoke.py", line 95, in __exit__
    self.stop()
  File "buildscripts/smoke.py", line 183, in stop
    self.proc.wait()
  File "/usr/lib/python2.6/subprocess.py", line 1182, in wait
    pid, sts = _eintr_retry_call(os.waitpid, self.pid, 0)
  File "/usr/lib/python2.6/subprocess.py", line 455, in _eintr_retry_call
    return func(*args)
KeyboardInterrupt

Comment by Antoine Girbal [ 09/Nov/11 ]

just logged to buildbot and tested killop.js there.
Ran many times with no failure or blocking.
Greg, could you test again with the new preemption setting thx

Comment by auto [ 07/Nov/11 ]

Author:

{u'login': u'agirbal', u'name': u'agirbal', u'email': u'antoine@10gen.com'}

Message: - SERVER-4191: try with lower preemption threshold
Branch: master
https://github.com/mongodb/mongo/commit/d373fcc43e26595e72b913aba6359af5793b4d6a

Comment by Antoine Girbal [ 05/Nov/11 ]

yes killop.js always passes for me, at least I ran it 100 times in a row with no error.
If I reduce preemption time, then it does not block for some seconds anymore.
Could you try following change in scripting/engine_v8.cpp

  • v8::Locker::StartPreemption( 50 );
    + v8::Locker::StartPreemption( 1 );
Comment by Greg Studer [ 04/Nov/11 ]

@antoine - btw, happy to test any patches you create, if you can't reproduce on your machine.

Comment by Eliot Horowitz (Inactive) [ 04/Nov/11 ]

The v8 builder is inf. looping - so something is wrong.

Comment by Greg Studer [ 04/Nov/11 ]

Yeah, never kills for me currently, but I vaguely recall it working sometimes. I'm on a pretty powerful desktop at work, just fyi. I assume the killop.js test case works for you?

Comment by Antoine Girbal [ 03/Nov/11 ]

If I start:
> db.foo.find({ $where: "while( 1 )

{ ; }

" })

I am able to kill it fine.
When running killop.js, it always passes for me.
It does take a few seconds though.
Is that what you are seeing, or does it never kill?

The delay is probably due to the v8 scheduler, we could shorten preemption if we feel like it's too long.

Comment by Antoine Girbal [ 03/Nov/11 ]

in theory the interrupt() only grabs a v8::Locker which is preemptive.
So the interrupt should be able to go through.
let me test

Generated at Thu Feb 08 03:05:14 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.