[SERVER-11294] Assertion: 16105 stack trace in dbstats command Created: 21/Oct/13  Updated: 11/Jul/16  Resolved: 25/Oct/13

Status: Closed
Project: Core Server
Component/s: Internal Code
Affects Version/s: 2.5.4
Fix Version/s: 2.5.4

Type: Bug Priority: Major - P3
Reporter: Daniel Gottlieb (Inactive) Assignee: Eliot Horowitz (Inactive)
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File mongodb.log     File mongodb.log     File mongodb.log.gz    
Operating System: Linux
Participants:

 Description   

2013-10-21T13:45:42.352-0400 [initandlisten] connection accepted from 127.0.0.1:37312 #1017 (87 connections now open)
2013-10-21T13:45:57.825-0400 [conn1016] lock status: r recursive:1 otherCount:0 nestableCount:-1 which:local
2013-10-21T13:45:57.825-0400 [conn1016] Assertion: 16105:expected to be write locked for local.$freelist
2013-10-21T13:45:57.831-0400 [conn1016] local 0xdcc403 0xd84c65 0xd6cb81 0xd6cc2c 0x924bfe 0xc4cb25 0xc4cc39 0xc4ccc6 0x92a851 0x941b90 0x93528f 0x935efb 0x936e2e 0xaaf8ea 0xab487d 0xa39c83 0x78b9b2 0xd9486d 0x7fd9cdffce9a 0x7fd9cd517ccd 
 mongod(_ZN5mongo15printStackTraceERSo+0x23) [0xdcc403]
 mongod(_ZN5mongo10logContextEPKc+0x1a5) [0xd84c65]
 mongod(_ZN5mongo11msgassertedEiPKc+0x91) [0xd6cb81]
 mongod() [0xd6cc2c]
 mongod(_ZN5mongo4Lock17assertWriteLockedERKNS_10StringDataE+0x8e) [0x924bfe]
 mongod(_ZN5mongo14NamespaceIndex6add_nsERKNS_9NamespaceEPKNS_16NamespaceDetailsE+0x45) [0xc4cb25]
 mongod(_ZN5mongo14NamespaceIndex6add_nsERKNS_10StringDataEPKNS_16NamespaceDetailsE+0x99) [0xc4cc39]
 mongod(_ZN5mongo14NamespaceIndex6add_nsERKNS_10StringDataERKNS_7DiskLocEb+0x46) [0xc4ccc6]
 mongod(_ZN5mongo8Database19_initExtentFreeListEv+0x111) [0x92a851]
 mongod(_ZN5mongo7DBStats3runERKSsRNS_7BSONObjEiRSsRNS_14BSONObjBuilderEb+0xa50) [0x941b90]
 mongod(_ZN5mongo12_execCommandEPNS_7CommandERKSsRNS_7BSONObjEiRSsRNS_14BSONObjBuilderEb+0x2f) [0x93528f]
 mongod(_ZN5mongo7Command11execCommandEPS0_RNS_6ClientEiPKcRNS_7BSONObjERNS_14BSONObjBuilderEb+0x93b) [0x935efb]
 mongod(_ZN5mongo12_runCommandsEPKcRNS_7BSONObjERNS_11_BufBuilderINS_16TrivialAllocatorEEERNS_14BSONObjBuilderEbi+0x23e) [0x936e2e]
 mongod(_ZN5mongo11runCommandsEPKcRNS_7BSONObjERNS_5CurOpERNS_11_BufBuilderINS_16TrivialAllocatorEEERNS_14BSONObjBuilderEbi+0x3a) [0xaaf8ea]
 mongod(_ZN5mongo8runQueryERNS_7MessageERNS_12QueryMessageERNS_5CurOpES1_+0xd4d) [0xab487d]
 mongod(_ZN5mongo16assembleResponseERNS_7MessageERNS_10DbResponseERKNS_11HostAndPortE+0xba3) [0xa39c83]
 mongod(_ZN5mongo16MyMessageHandler7processERNS_7MessageEPNS_21AbstractMessagingPortEPNS_9LastErrorE+0x92) [0x78b9b2]
 mongod(_ZN5mongo17PortMessageServer17handleIncomingMsgEPv+0x4cd) [0xd9486d]
 /lib/x86_64-linux-gnu/libpthread.so.0(+0x7e9a) [0x7fd9cdffce9a]
 /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7fd9cd517ccd]

I unfortunately don't have a quick reproducible test case for kernel team use. Fortunately, it is relatively easily reproducible by members of the MMS team(1). I believe the MMS Monitoring agent is issuing the offending query/command to a standalone MongoD. I'm sure any context from the kernel team from the stack trace will help narrow down the details necessary to create a small reproducible test case. I'm also not positive that there's any actual degraded/undefined behavior that accompanies the stack trace.

(1) (Hard way) to reproduce:

  • Running off `v20131021` branch
  • Start a clean mongod standalone on port 27017
    • Running `2.5.4-pre`, specifically commit: `55b49f25f683e7ff86eae40937b0d09d96cfd40d`
    • Only running with `--dbpath` set.
  • run `ant init.dev` to populate the `test1` group and the default hosts
  • run `ant mms.server.local`
  • run `ant mms.agent` (or run a current version of the python code directly)
  • observe the 27017 mongod print out stack traces

Unfortunately, the mms agent does not seem to notice a problem and seemingly operates normally.



 Comments   
Comment by Daniel Gottlieb (Inactive) [ 25/Oct/13 ]

Looks good!

Comment by Eliot Horowitz (Inactive) [ 25/Oct/13 ]

Should be fixed, please try again.

Comment by auto [ 25/Oct/13 ]

Author:

{u'username': u'erh', u'name': u'Eliot Horowitz', u'email': u'eliot@10gen.com'}

Message: SERVER-11294: create extent by default line we did in 2.4
Branch: master
https://github.com/mongodb/mongo/commit/5e58c72061b11949db2d15254af99a9c34b242ce

Comment by Eliot Horowitz (Inactive) [ 24/Oct/13 ]

Have a fix locally, behind one other thing

Comment by Daniel Gottlieb (Inactive) [ 23/Oct/13 ]

Reran tests on the current master HEAD: `2b16aca77cbd4486d780e21636b7043733e12765`. I aborted the full test plan due to receiving seemingly different errors at the end of the mms server coming up (thus not caused by the monitoring agent).

Stacktrace:

2013-10-23T15:44:21.385-0400 [conn18] command mmsdbconfig.$cmd command: { drop: "tmp.convertToCapped.automationAgentLogs" } ntoreturn:1 keyUpdates:0  reslen:62 0ms
2013-10-23T15:44:21.385-0400 [conn18] runQuery called mmsdbconfig.$cmd { cloneCollectionAsCapped: "automationAgentLogs", toCollection: "tmp.convertToCapped.automationAgentLogs", size: 1048576.0, temp: true }
2013-10-23T15:44:21.385-0400 [conn18] run command mmsdbconfig.$cmd { cloneCollectionAsCapped: "automationAgentLogs", toCollection: "tmp.convertToCapped.automationAgentLogs", size: 1048576.0, temp: true }
2013-10-23T15:44:21.385-0400 [conn18] mmsdbconfig Assertion failure dl.a() != -1 src/mongo/db/pdfile.h 409
2013-10-23T15:44:21.392-0400 [conn18] mmsdbconfig 0xdd3033 0xd8b895 0xd73201 0x75b73e 0x94cbcd 0x93e1df 0x93ef68 0x93fd6b 0xab096a 0xab58fd 0xa3b683 0xa3d269 0x7f54f3 0x7d62a2 0xa349ac 0x7ce126 0x7ce82a 0x7d8f85 0x9455ff 0x93e1df 
 mongod(_ZN5mongo15printStackTraceERSo+0x23) [0xdd3033]
 mongod(_ZN5mongo10logContextEPKc+0x1a5) [0xd8b895]
 mongod(_ZN5mongo12verifyFailedEPKcS1_j+0x101) [0xd73201]
 mongod() [0x75b73e]
 mongod(_ZN5mongo26CmdCloneCollectionAsCapped3runERKSsRNS_7BSONObjEiRSsRNS_14BSONObjBuilderEb+0xbbd) [0x94cbcd]
 mongod(_ZN5mongo12_execCommandEPNS_7CommandERKSsRNS_7BSONObjEiRSsRNS_14BSONObjBuilderEb+0x2f) [0x93e1df]
 mongod(_ZN5mongo7Command11execCommandEPS0_RNS_6ClientEiPKcRNS_7BSONObjERNS_14BSONObjBuilderEb+0xa58) [0x93ef68]
 mongod(_ZN5mongo12_runCommandsEPKcRNS_7BSONObjERNS_11_BufBuilderINS_16TrivialAllocatorEEERNS_14BSONObjBuilderEbi+0x24b) [0x93fd6b]
 mongod(_ZN5mongo11runCommandsEPKcRNS_7BSONObjERNS_5CurOpERNS_11_BufBuilderINS_16TrivialAllocatorEEERNS_14BSONObjBuilderEbi+0x3a) [0xab096a]
 mongod(_ZN5mongo8runQueryERNS_7MessageERNS_12QueryMessageERNS_5CurOpES1_+0xd4d) [0xab58fd]
 mongod(_ZN5mongo16assembleResponseERNS_7MessageERNS_10DbResponseERKNS_11HostAndPortE+0xba3) [0xa3b683]
 mongod(_ZN5mongo14DBDirectClient4callERNS_7MessageES2_bPSs+0x99) [0xa3d269]
 mongod(_ZN5mongo14DBClientCursor4initEv+0xa3) [0x7f54f3]
 mongod(_ZN5mongo12DBClientBase5queryERKSsNS_5QueryEiiPKNS_7BSONObjEii+0x262) [0x7d62a2]
 mongod(_ZN5mongo14DBDirectClient5queryERKSsNS_5QueryEiiPKNS_7BSONObjEii+0x4c) [0xa349ac]
 mongod(_ZN5mongo17DBClientInterface5findNERSt6vectorINS_7BSONObjESaIS2_EERKSsNS_5QueryEiiPKS2_i+0x86) [0x7ce126]
 mongod(_ZN5mongo17DBClientInterface7findOneERKSsRKNS_5QueryEPKNS_7BSONObjEi+0x6a) [0x7ce82a]
 mongod(_ZN5mongo20DBClientWithCommands10runCommandERKSsRKNS_7BSONObjERS3_i+0x75) [0x7d8f85]
 mongod(_ZN5mongo18CmdConvertToCapped3runERKSsRNS_7BSONObjEiRSsRNS_14BSONObjBuilderEb+0x60f) [0x9455ff]
 mongod(_ZN5mongo12_execCommandEPNS_7CommandERKSsRNS_7BSONObjEiRSsRNS_14BSONObjBuilderEb+0x2f) [0x93e1df]
2013-10-23T15:44:21.392-0400 [conn18] command mmsdbconfig.$cmd command: { cloneCollectionAsCapped: "automationAgentLogs", toCollection: "tmp.convertToCapped.automationAgentLogs", size: 1048576.0, temp: true } ntoreturn:1 keyUpdates:0  reslen:106 6ms
2013-10-23T15:44:21.392-0400 [conn18] command mmsdbconfig.$cmd command: { convertToCapped: "automationAgentLogs", size: 1048576 } ntoreturn:1 keyUpdates:0 locks(micros) W:7099 reslen:160 7ms
2013-10-23T15:44:21.405-0400 [journal] journal WRITETODATAFILES 0.163ms
2013-10-23T15:44:22.020-0400 [journal] journal REMAPPRIVATEVIEW done startedAt: 6 n:48 2ms
2013-10-23T15:44:22.107-0400 [conn21] runQuery called mmsdbqueues.$cmd { findandmodify: "data.incrementQueue", remove: true }
2013-10-23T15:44:22.107-0400 [conn21] run command mmsdbqueues.$cmd { findandmodify: "data.incrementQueue", remove: true }

Full logs with -vv is being attached.

Comment by Scott Hernandez (Inactive) [ 21/Oct/13 ]

Cool, thanks Dan. We can try with the new code later too but the log level is more informative now, as you can see. Also, all local dbstats show the same stack trace in the log file.

2013-10-21T17:12:13.365-0400 [conn73] run command local.$cmd { dbstats: 1 }
2013-10-21T17:12:13.369-0400 [conn73] lock status: r recursive:1 otherCount:0 nestableCount:-1 which:local
2013-10-21T17:12:13.369-0400 [conn73] Assertion: 16105:expected to be write locked for local.$freelist
2013-10-21T17:12:13.375-0400 [conn73] local 0xdcc403 0xd84c65 0xd6cb81 0xd6cc2c 0x924bfe 0xc4cb25 0xc4cc39 0xc4ccc6 0x92a851 0x941b90 0x93528f 0x935efb 0x936e2e 0xaaf8ea 0xab487d 0xa39c83 0x78b9b2 0xd9486d 0x7f6e2becae9a 0x7f6e2b3e5ccd 
 mongod(_ZN5mongo15printStackTraceERSo+0x23) [0xdcc403]
 mongod(_ZN5mongo10logContextEPKc+0x1a5) [0xd84c65]
 mongod(_ZN5mongo11msgassertedEiPKc+0x91) [0xd6cb81]
 mongod() [0xd6cc2c]
 mongod(_ZN5mongo4Lock17assertWriteLockedERKNS_10StringDataE+0x8e) [0x924bfe]
 mongod(_ZN5mongo14NamespaceIndex6add_nsERKNS_9NamespaceEPKNS_16NamespaceDetailsE+0x45) [0xc4cb25]
 mongod(_ZN5mongo14NamespaceIndex6add_nsERKNS_10StringDataEPKNS_16NamespaceDetailsE+0x99) [0xc4cc39]
 mongod(_ZN5mongo14NamespaceIndex6add_nsERKNS_10StringDataERKNS_7DiskLocEb+0x46) [0xc4ccc6]
 mongod(_ZN5mongo8Database19_initExtentFreeListEv+0x111) [0x92a851]
 mongod(_ZN5mongo7DBStats3runERKSsRNS_7BSONObjEiRSsRNS_14BSONObjBuilderEb+0xa50) [0x941b90]
 mongod(_ZN5mongo12_execCommandEPNS_7CommandERKSsRNS_7BSONObjEiRSsRNS_14BSONObjBuilderEb+0x2f) [0x93528f]
 mongod(_ZN5mongo7Command11execCommandEPS0_RNS_6ClientEiPKcRNS_7BSONObjERNS_14BSONObjBuilderEb+0x93b) [0x935efb]
 mongod(_ZN5mongo12_runCommandsEPKcRNS_7BSONObjERNS_11_BufBuilderINS_16TrivialAllocatorEEERNS_14BSONObjBuilderEbi+0x23e) [0x936e2e]
 mongod(_ZN5mongo11runCommandsEPKcRNS_7BSONObjERNS_5CurOpERNS_11_BufBuilderINS_16TrivialAllocatorEEERNS_14BSONObjBuilderEbi+0x3a) [0xaaf8ea]
 mongod(_ZN5mongo8runQueryERNS_7MessageERNS_12QueryMessageERNS_5CurOpES1_+0xd4d) [0xab487d]
 mongod(_ZN5mongo16assembleResponseERNS_7MessageERNS_10DbResponseERKNS_11HostAndPortE+0xba3) [0xa39c83]
 mongod(_ZN5mongo16MyMessageHandler7processERNS_7MessageEPNS_21AbstractMessagingPortEPNS_9LastErrorE+0x92) [0x78b9b2]
 mongod(_ZN5mongo17PortMessageServer17handleIncomingMsgEPv+0x4cd) [0xd9486d]
 /lib/x86_64-linux-gnu/libpthread.so.0(+0x7e9a) [0x7f6e2becae9a]
 /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f6e2b3e5ccd]
2013-10-21T17:12:13.375-0400 [conn73] command local.$cmd command: { dbstats: 1 } ntoreturn:1 keyUpdates:0 locks(micros) r:9927 reslen:307 9ms
2013-10-21T17:12:13.376-0400 [conn73] runQuery called mmsdbminutes-even.$cmd { dbstats: 1 }

Comment by Scott Hernandez (Inactive) [ 21/Oct/13 ]

Dan, log level and profile level are two different things:

Command line flag for log level (-v*): http://docs.mongodb.org/manual/reference/program/mongod/#cmdoption--verbose
Runtime log level: http://docs.mongodb.org/manual/reference/parameters/#param.logLevel

DB Profiling (system.profile collection): http://docs.mongodb.org/manual/reference/method/db.setProfilingLevel/#db.setProfilingLevel
Note, this can also be set as startup options, but is not log file related.

Comment by Daniel Gottlieb (Inactive) [ 21/Oct/13 ]

What Cailin said.

Running with profiling set to 2...

Edit: The agent is not showing signs of receiving an error.

Comment by Scott Hernandez (Inactive) [ 21/Oct/13 ]

Do you know if there is an error returned to the agent? Also, if you can provide logs at level 2 that would be helpful, or more logs around that event.

A quick test of running dbstats across new databases and the local db 1000 times in a loop did not turn up the message for me.

Comment by Scott Hernandez (Inactive) [ 21/Oct/13 ]

Looks like this line DBStats::run():

                DataFileHeader* header = d->getFile( 0 )->getHeader();

Generated at Thu Feb 08 03:25:24 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.