[SERVER-9695] Couldn't kill $where op in features3.js on RHEL 32 Created: 15/May/13  Updated: 11/Jul/16  Resolved: 17/May/13

Status: Closed
Project: Core Server
Component/s: Testing Infrastructure
Affects Version/s: None
Fix Version/s: 2.4.4, 2.5.0

Type: Bug Priority: Major - P3
Reporter: Ian Whalen (Inactive) Assignee: Ben Becker
Resolution: Done Votes: 0
Labels: buildbot
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Backwards Compatibility: Fully Compatible
Operating System: ALL
Participants:

 Description   

http://buildlogs.mongodb.org/Nightly%20Linux%20RHEL%2032-bit/builds/497/test/sharding/features3.js

assert.soon failed: function () {
    // Get all the current operations
    mine = getMine(true);  // SERVER-8794: print all operations
    // get curren tops, but only print out operations before we see a $where op has started
    // mine = getMine(curOpState == 0 && i > 20);
    i++;
    // Wait for the queries to start
    if (curOpState == 0 && mine.length > 0) {
        // queries started
        curOpState = 1;
        // kill all $where
        mine.forEach(function(z) {
            printjson(db.getSisterDB("admin").killOp(z.opid));
        });
        killTime = new Date();
    }
    // Wait for killed queries to end
    else if (curOpState == 1 && mine.length == 0) {
        // Queries ended
        curOpState = 2;
        return true;
    }
}, msg:Couldn't kill the $where operations.
Error: Printing Stack Trace
    at printStackTrace (src/mongo/shell/utils.js:37:15)
    at doassert (src/mongo/shell/assert.js:6:5)
    at Function.assert.soon (src/mongo/shell/assert.js:110:60)
    at /data/buildslaves/Linux_RHEL_32bit_Nightly/mongo/jstests/sharding/features3.js:99:8



 Comments   
Comment by auto [ 22/May/13 ]

Author:

{u'username': u'milkie', u'name': u'Eric Milkie', u'email': u'milkie@10gen.com'}

Message: SERVER-9695 fix racy test
Branch: v2.4
https://github.com/mongodb/mongo/commit/c781250a897fe37bd806593c45c3b8dbe424358d

Comment by auto [ 17/May/13 ]

Author:

{u'date': u'2013-05-17T13:30:51Z', u'name': u'Eric Milkie', u'email': u'milkie@10gen.com'}

Message: SERVER-9695 fix racy test
Branch: master
https://github.com/mongodb/mongo/commit/3595a3902afc0ca14db1f92ad3f78a23c0d63092

Comment by Eric Milkie [ 17/May/13 ]

I spent some time looking at this. In a proper executing test, there are two $where ops to be killed. In the failing case, only one of the two is being killed.

This is because in the assert.soon loop, we attempt to do too many things. We attempt to list all the processes, search for the $where ops, kill them, and then wait for their demise. All in one loop!
Unfortunately, the loop appears to have a bug, where we attempt to kill stuff only the first time we see any $where ops. We don't wait to see if we have all of them. So if we happen to only see the first one on our first scan, we only kill one and then the other one lives forever.
This could probably be fixed by attempting to kill ops on every loop iteration, when we see $where ops. We could change the filter to filter out ops that already have killPending set.

Comment by Ben Becker [ 16/May/13 ]

ian@10gen.com, I would suggest removing the $where portion of the test (assuming the intended behavior has been confirmed).

Comment by Ian Whalen (Inactive) [ 16/May/13 ]

ben are you thinking we should kill the whole test? change it somehow? can you make a suggestion on how to proceed here?

Comment by Ben Becker [ 16/May/13 ]

tad, yes, I think the $where/killOp part of the test is invalid.

Comment by Tad Marshall [ 16/May/13 ]

Is this an invalid test then? ... testing something that can't be expected to always work?

Comment by Ben Becker [ 16/May/13 ]

I don't think there's anything that can be done here. The operation correctly didn't yield before the timeout, so there's never a check for a pending kill request.

The only way to 'fix' this would be to move the pending kill request outside of the yield logic. While this would be more logical, it may introduce a new concurrency bottleneck (in checkForInterrupt()), and only fixes this contrived test case.

Comment by Ian Whalen (Inactive) [ 16/May/13 ]

benjamin.becker any thoughts/progress on this? seems to be a 2.5.0 blocker.

Generated at Thu Feb 08 03:21:12 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.