[SERVER-19143] race in setting OpDebug ns can cause invalid BSON to be returned from currentOp command Created: 25/Jun/15  Updated: 09/Dec/16  Resolved: 29/Jun/15

Status: Closed
Project: Core Server
Component/s: Internal Code
Affects Version/s: 3.1.4
Fix Version/s: 3.1.5

Type: Bug Priority: Critical - P2
Reporter: Adam Midvidy Assignee: Adam Midvidy
Resolution: Done Votes: 0
Labels: UT, dnsf
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Documented
is documented by DOCS-8970 race in setting OpDebug ns can cause ... Closed
Related
is related to SERVER-13584 Fill out currentOp "ns" field before ... Closed
Backwards Compatibility: Minor Change
Operating System: ALL
Steps To Reproduce:

run evalc.js in a loop, or run basic or basicPlus.js

Sprint: Platform 5 06/26/16
Participants:
Linked BF Score: 0

 Description   

The currentOp command calls CurOp::reportState to report the state of an operation.

CurOp::reportState will fill in the "ns" field of its passed BSONObjBuilder with it's _ns field OR the ns field of its OpDebug member if _ns is not set.

builder->append("ns", !_ns.empty() ? _ns : _debug.ns);

Unfortunately, access to the OpDebug field is not synchronized. As such, the "ns" field of an operation object returned in the currentOp command can contain garbage data. When the garbage data does not end with a null byte, this can resullt in invalid BSON being returned to the client.

As part of OP_COMMAND work, the shell will now validate BSON in command responses from the server, which uncovered this issue.

For example - here are the raw bytes from a problematic currentOp response:

"\xb2\x02\x00\x00\x04inprog\x00\x99\x02\x00\x00\x030\x00q\x01\x00\x00\x02desc\x00\x07\x00\x00\x00conn54\x00\x02threadId\x00\r\x00\x00\x007ff2a6112700\x00\x10connectionId\x006\x00\x00\x00\x02client\x00\x10\x00\x$
0\x00127.0.0.1:58533\x00\x08active\x00\x01\x10opid\x00a\xed\x03\x00\x02op\x00\x06\x00\x00\x00query\x00\x02ns\x00\x01\x00\x00\x00t\x03query\x00\x05\x00\x00\x00\x00\x10numYields\x00\x00\x00\x00\x00\x03locks\x00\x$
3\x00\x00\x00\x02Global\x00\x02\x00\x00\x00W\x00\x00\x08waitingForLock\x00\x00\x03lockStats\x00\x97\x00\x00\x00\x03Global\x00.\x00\x00\x00\x03acquireCount\x00\x1b\x00\x00\x00\x12r\x00\x03\x00\x00\x00\x00\x00\x00
\x00\x12W\x00\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x03Database\x00#\x00\x00\x00\x03acquireCount\x00\x10\x00\x00\x00\x12r\x00\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x03Collection\x00#\x00\x00\x00\x03acquireCo
unt\x00\x10\x00\x00\x00\x12r\x00\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x031\x00\x1d\x01\x00\x00\x02desc\x00\x07\x00\x00\x00conn55\x00\x02threadId\x00\r\x00\x00\x007ff2a6238700\x00\x10connectionId\x007\
x00\x00\x00\x02client\x00\x10\x00\x00\x00127.0.0.1:58534\x00\x08active\x00\x01\x10opid\x000\xee\x03\x00\x10secs_running\x00\x00\x00\x00\x00\x12microsecs_running\x00'\x00\x00\x00\x00\x00\x00\x00\x02op\x00\x08\x00
\x00\x00command\x00\x02ns\x00\x0b\x00\x00\x00admin.$cmd\x00\x03query\x00\x18\x00\x00\x00\x01currentOp\x00\x00\x00\x00\x00\x00\x00\xf0?\x00\x10numYields\x00\x00\x00\x00\x00\x03locks\x00\x05\x00\x00\x00\x00\x08wai
tingForLock\x00\x00\x03lockStats\x00\x05\x00\x00\x00\x00\x00\x00\x01ok\x00\x00\x00\x00\x00\x00\x00\xf0?\x00"

Problematic part (offsets 148 through 155)

ns\x00\x01\x00\x00\x00t\x03query

at offset 154, the content of an 'ns' field has a 't' (0x74) instead of a trailing null before the start of the next element.



 Comments   
Comment by Adam Midvidy [ 29/Jun/15 ]

The fix for this issue has resulted in a minor behavior change in the profiling output for commands that have the _ns field of their associated CurOp set when acquiring locks.

Profiling output for an 'distinct' command (with patch):

{
        "op" : "command",
        "ns" : "test.test",
        "command" : {
                "distinct" : "test",
                "key" : "x"
        },
......

Profiling output for an 'distinct' command (without patch):

{
        "op" : "command",
        "ns" : "test.$cmd",
        "command" : {
                "distinct" : "test",
                "key" : "x"
        },
      ......
}

Note the change in the 'ns' field.

Comment by Githook User [ 29/Jun/15 ]

Author:

{u'username': u'amidvidy', u'name': u'Adam Midvidy', u'email': u'amidvidy@gmail.com'}

Message: SERVER-19143 remove ns from OpDebug and synchronize access to NS in curop
Branch: master
https://github.com/mongodb/mongo/commit/a05fe115d44c58b6e81dcd60f254d15cfa3ab243

Comment by Githook User [ 26/Jun/15 ]

Author:

{u'username': u'kaloianm', u'name': u'Kaloian Manassiev', u'email': u'kaloian.manassiev@mongodb.com'}

Message: Revert "SERVER-19143 remove ns from OpDebug and synchronize access to NS in curOp"

This reverts commit 780745072104ee4d0514f09e71765ac99d380bdc.
Branch: master
https://github.com/mongodb/mongo/commit/9ad1597c2d2ead08a74bd8f53a8458d0f7cb3987

Comment by Githook User [ 26/Jun/15 ]

Author:

{u'username': u'amidvidy', u'name': u'Adam Midvidy', u'email': u'amidvidy@gmail.com'}

Message: SERVER-19143 remove ns from OpDebug and synchronize access to NS in curOp
Branch: master
https://github.com/mongodb/mongo/commit/780745072104ee4d0514f09e71765ac99d380bdc

Generated at Thu Feb 08 03:49:58 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.