[SERVER-24314] mongod fatal assertion after "hazard pointer table full" message Created: 31/May/16  Updated: 14/Jul/16  Resolved: 10/Jun/16

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 3.2.6
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Andrey Hohutkin Assignee: Kelsey Schubert
Resolution: Done Votes: 1
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: Zip Archive diagnostic.data.zip    
Operating System: ALL
Participants:

 Description   

2016-05-31T09:34:37.836+0000 I COMMAND  [conn216] command appmahal.activities command: aggregate { aggregate: "activities", pipeline: [ { $match: { $or: [ { userId: null, action: { $in: [ "PushRecommendationRich", "newsItemInvite", "arti
clePush", "customApp", "customPromotion" ] }, time: { $lt: new Date(1464687274143) } } ] } }, { $group: { _id: { userId: "$userId", articleId: "$articleId", app: "$app", action: "$action", userIds: "$userIds", extra: "$extra", userObjectId: "$userObjectId" }, time: { $last: "$time" }, id: { $last: "$_id" }, value: { $last: "$value" } } }, { $match: { value: true } }, { $sort: { time: -1 } }, { $limit: 20 }, { $project: { _id: 0, id: 1, userId: "$_id.userId", app: "$_id.app", articleId: "$_id.articleId", action: "$_id.action", usersIds: "$_id.userIds", userObjectId: "$_id.userObjectId", time: "$time", extra: "$_id.extra" } } ] } keyUpdates:0 writeConflicts:0 numYields:1085 reslen:50219 locks:{ Global: {
 acquireCount: { r: 2520 } }, Database: { acquireCount: { r: 1260 } }, Collection: { acquireCount: { r: 1260 } } } protocol:op_query 3692ms
2016-05-31T09:34:43.507+0000 I ACCESS   [conn66] Successfully authenticated as principal admin on admin
2016-05-31T09:34:46.030+0000 E STORAGE  [conn282] WiredTiger (0) [1464687286:30384][25367:0x7f1387e09700], file:index-52-6724719125319965842.wt, WT_CURSOR.search_near: session 0x2ddb500: hazard pointer table full
2016-05-31T09:34:46.048+0000 I -        [conn282] Invariant failure: ret resulted in status UnknownError: 12: Cannot allocate memory at src/mongo/db/storage/wiredtiger/wiredtiger_index.cpp 824
2016-05-31T09:34:46.199+0000 I CONTROL  [conn282]
 0x1315982 0x12b3798 0x12a067d 0x1078d6d 0x107d740 0xbfb0c8 0xbfb793 0xbff969 0xbef5b1 0xe2b435 0xe2baf9 0xd54a09 0xd550c8 0xd5d131 0xd5dfd0 0xc092b7 0xc0995f 0xe2b435 0xe2baf9 0xb75611 0xbc5f03 0xbc6d74 0xb22dc0 0xcd20e5 0xcd4976 0x9b78
cc 0x12c2e7d 0x7f13aa233182 0x7f13a9f6047d
----- BEGIN BACKTRACE -----
{"backtrace":[{"b":"400000","o":"F15982","s":"_ZN5mongo15printStackTraceERSo"},{"b":"400000","o":"EB3798","s":"_ZN5mongo10logContextEPKc"},{"b":"400000","o":"EA067D","s":"_ZN5mongo17invariantOKFailedEPKcRKNS_6StatusES1_j"},{"b":"400000",
"o":"C78D6D"},{"b":"400000","o":"C7D740"},{"b":"400000","o":"7FB0C8","s":"_ZN5mongo9IndexScan13initIndexScanEv"},{"b":"400000","o":"7FB793","s":"_ZN5mongo9IndexScan4workEPm"},{"b":"400000","o":"7FF969","s":"_ZN5mongo14MergeSortStage4work
EPm"},{"b":"400000","o":"7EF5B1","s":"_ZN5mongo10FetchStage4workEPm"},{"b":"400000","o":"A2B435","s":"_ZN5mongo12PlanExecutor11getNextImplEPNS_11SnapshottedINS_7BSONObjEEEPNS_8RecordIdE"},{"b":"400000","o":"A2BAF9","s":"_ZN5mongo12PlanEx
ecutor7getNextEPNS_7BSONObjEPNS_8RecordIdE"},{"b":"400000","o":"954A09","s":"_ZN5mongo20DocumentSourceCursor9loadBatchEv"},{"b":"400000","o":"9550C8","s":"_ZN5mongo20DocumentSourceCursor7getNextEv"},{"b":"400000","o":"95D131","s":"_ZN5mo
ngo19DocumentSourceGroup8populateEv"},{"b":"400000","o":"95DFD0","s":"_ZN5mongo19DocumentSourceGroup7getNextEv"},{"b":"400000","o":"8092B7","s":"_ZN5mongo18PipelineProxyStage11getNextBsonEv"},{"b":"400000","o":"80995F","s":"_ZN5mongo18Pi
pelineProxyStage4workEPm"},{"b":"400000","o":"A2B435","s":"_ZN5mongo12PlanExecutor11getNextImplEPNS_11SnapshottedINS_7BSONObjEEEPNS_8RecordIdE"},{"b":"400000","o":"A2BAF9","s":"_ZN5mongo12PlanExecutor7getNextEPNS_7BSONObjEPNS_8RecordIdE"
},{"b":"400000","o":"775611","s":"_ZN5mongo15PipelineCommand3runEPNS_16OperationContextERKSsRNS_7BSONObjEiRSsRNS_14BSONObjBuilderE"},{"b":"400000","o":"7C5F03","s":"_ZN5mongo7Command3runEPNS_16OperationContextERKNS_3rpc16RequestInterface
EPNS3_21ReplyBuilderInterfaceE"},{"b":"400000","o":"7C6D74","s":"_ZN5mongo7Command11execCommandEPNS_16OperationContextEPS0_RKNS_3rpc16RequestInterfaceEPNS4_21ReplyBuilderInterfaceE"},{"b":"400000","o":"722DC0","s":"_ZN5mongo11runCommands
EPNS_16OperationContextERKNS_3rpc16RequestInterfaceEPNS2_21ReplyBuilderInterfaceE"},{"b":"400000","o":"8D20E5"},{"b":"400000","o":"8D4976","s":"_ZN5mongo16assembleResponseEPNS_16OperationContextERNS_7MessageERNS_10DbResponseERKNS_11HostA
ndPortE"},{"b":"400000","o":"5B78CC","s":"_ZN5mongo16MyMessageHandler7processERNS_7MessageEPNS_21AbstractMessagingPortE"},{"b":"400000","o":"EC2E7D","s":"_ZN5mongo17PortMessageServer17handleIncomingMsgEPv"},{"b":"7F13AA22B000","o":"8182"
},{"b":"7F13A9E66000","o":"FA47D","s":"clone"}],"processInfo":{ "mongodbVersion" : "3.2.6", "gitVersion" : "05552b562c7a0b3143a729aaa0838e558dc49b25", "compiledModules" : [], "uname" : { "sysname" : "Linux", "release" : "3.13.0-46-generi
c", "version" : "#79-Ubuntu SMP Tue Mar 10 20:06:50 UTC 2015", "machine" : "x86_64" }, "somap" : [ { "elfType" : 2, "b" : "400000", "buildId" : "B35B0D2B9CB934315BA086AB5C88C919B9B5BA50" }, { "b" : "7FFF218F5000", "elfType" : 3, "buildId
" : "88E7559031E488BC215236F4181BD1FAF9A458F0" }, { "b" : "7F13AB14D000", "path" : "/lib/x86_64-linux-gnu/libssl.so.1.0.0", "elfType" : 3, "buildId" : "E21720F2804EF30440F2B39CD409252C26F58F73" }, { "b" : "7F13AAD71000", "path" : "/lib/x
86_64-linux-gnu/libcrypto.so.1.0.0", "elfType" : 3, "buildId" : "9BC22F9457E3D7E9CF8DDC135C0DAC8F7742135D" }, { "b" : "7F13AAB69000", "path" : "/lib/x86_64-linux-gnu/librt.so.1", "elfType" : 3, "buildId" : "B376100CAB1EAC4E5DE066EACFC282
BF7C0B54F3" }, { "b" : "7F13AA965000", "path" : "/lib/x86_64-linux-gnu/libdl.so.2", "elfType" : 3, "buildId" : "67699FFDA9FD2A552032E0652A242E82D65AA10D" }, { "b" : "7F13AA65F000", "path" : "/lib/x86_64-linux-gnu/libm.so.6", "elfType" :
3, "buildId" : "EF3F6DFFA1FBE48436EC6F45CD3AABA157064BB4" }, { "b" : "7F13AA449000", "path" : "/lib/x86_64-linux-gnu/libgcc_s.so.1", "elfType" : 3, "buildId" : "36311B4457710AE5578C4BF00791DED7359DBB92" }, { "b" : "7F13AA22B000", "path"
: "/lib/x86_64-linux-gnu/libpthread.so.0", "elfType" : 3, "buildId" : "AF06068681750736E0524DF17D5A86CB2C3F765C" }, { "b" : "7F13A9E66000", "path" : "/lib/x86_64-linux-gnu/libc.so.6", "elfType" : 3, "buildId" : "5382058B69031CAA9B9996C11
061CD164C9398FF" }, { "b" : "7F13AB3AC000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3, "buildId" : "2A816C3EBBA4E12813FBD34B06FBD25BC892A67F" } ] }}
 mongod(_ZN5mongo15printStackTraceERSo+0x32) [0x1315982]
 mongod(_ZN5mongo10logContextEPKc+0x138) [0x12b3798]
 mongod(_ZN5mongo17invariantOKFailedEPKcRKNS_6StatusES1_j+0xAD) [0x12a067d]
 mongod(+0xC78D6D) [0x1078d6d]
 mongod(+0xC7D740) [0x107d740]
 mongod(_ZN5mongo9IndexScan13initIndexScanEv+0x2B8) [0xbfb0c8]
 mongod(_ZN5mongo9IndexScan4workEPm+0x1F3) [0xbfb793]
 mongod(_ZN5mongo14MergeSortStage4workEPm+0x69) [0xbff969]
 mongod(_ZN5mongo10FetchStage4workEPm+0x161) [0xbef5b1]
 mongod(_ZN5mongo12PlanExecutor11getNextImplEPNS_11SnapshottedINS_7BSONObjEEEPNS_8RecordIdE+0x275) [0xe2b435]
 mongod(_ZN5mongo12PlanExecutor7getNextEPNS_7BSONObjEPNS_8RecordIdE+0x39) [0xe2baf9]
 mongod(_ZN5mongo20DocumentSourceCursor9loadBatchEv+0x1C9) [0xd54a09]
 mongod(_ZN5mongo20DocumentSourceCursor7getNextEv+0xF8) [0xd550c8]
 mongod(_ZN5mongo19DocumentSourceGroup8populateEv+0xC1) [0xd5d131]
 mongod(_ZN5mongo19DocumentSourceGroup7getNextEv+0x270) [0xd5dfd0]
 mongod(_ZN5mongo18PipelineProxyStage11getNextBsonEv+0x47) [0xc092b7]
 mongod(_ZN5mongo18PipelineProxyStage4workEPm+0x10F) [0xc0995f]
 mongod(_ZN5mongo12PlanExecutor11getNextImplEPNS_11SnapshottedINS_7BSONObjEEEPNS_8RecordIdE+0x275) [0xe2b435]
 mongod(_ZN5mongo12PlanExecutor7getNextEPNS_7BSONObjEPNS_8RecordIdE+0x39) [0xe2baf9]
 mongod(_ZN5mongo15PipelineCommand3runEPNS_16OperationContextERKSsRNS_7BSONObjEiRSsRNS_14BSONObjBuilderE+0x771) [0xb75611]
 mongod(_ZN5mongo7Command3runEPNS_16OperationContextERKNS_3rpc16RequestInterfaceEPNS3_21ReplyBuilderInterfaceE+0x463) [0xbc5f03]
 mongod(_ZN5mongo7Command11execCommandEPNS_16OperationContextEPS0_RKNS_3rpc16RequestInterfaceEPNS4_21ReplyBuilderInterfaceE+0x404) [0xbc6d74]
 mongod(_ZN5mongo11runCommandsEPNS_16OperationContextERKNS_3rpc16RequestInterfaceEPNS2_21ReplyBuilderInterfaceE+0x1F0) [0xb22dc0]
 mongod(+0x8D20E5) [0xcd20e5]
 mongod(_ZN5mongo16assembleResponseEPNS_16OperationContextERNS_7MessageERNS_10DbResponseERKNS_11HostAndPortE+0x696) [0xcd4976]
 mongod(_ZN5mongo16MyMessageHandler7processERNS_7MessageEPNS_21AbstractMessagingPortE+0xEC) [0x9b78cc]
 mongod(_ZN5mongo17PortMessageServer17handleIncomingMsgEPv+0x26D) [0x12c2e7d]
 libpthread.so.0(+0x8182) [0x7f13aa233182]
 libc.so.6(clone+0x6D) [0x7f13a9f6047d]
-----  END BACKTRACE  -----

Config option specified:
internalQueryMaxScansToExplode=2000



 Comments   
Comment by Kelsey Schubert [ 10/Jun/16 ]

Hi andrey.hohutkin@gmail.com,

We've been discussing this behavior internally. As you correctly identify, this issue is the result of setting internalQueryMaxScansToExplode=2000. Increasing this value from its default may cause your queries to exceed your system's memory resources.

From our investigation, I do not see anything to indicate a bug in the MongoDB server. Unfortunately, this behavior appears to be result of running into physical limitations. Therefore, I would like to make amend my previous recommendation: rather than pursuing this approach run 10,000 sub-scans by modifying an internal WiredTiger setting, I would instead suggest you consider setting internalQueryMaxScansToExplode to smaller value to resolve this issue.

For MongoDB-related support discussion please post on the mongodb-users group or Stack Overflow with the mongodb tag. A question about optimizing MongoDB for your access patterns would involve more discussion would be best posted on the mongodb-users group.

Kind regards,
Thomas

Comment by Andrey Hohutkin [ 06/Jun/16 ]

Hello, again!
Please, put an attention that a setting internalQueryMaxScansToExplode=2000 is critical point in this issue.
And I put it to configuration to check how stable it works because my technical requirements are to increase it to 10000 at production.
I have a query with 10000 unique combinations of 500 user ids and 20 their events (and I limit others) and I have to make sure that a query is covered by index scan only.
Without this setting a query will scan database with significant performance slowdown.

Comment by Alexander Gorrod [ 06/Jun/16 ]

ramon.fernandez Yes - that's right. david.storch thanks for the explanation.

anonymous.user Would you mind updating the ticket? I believe suggesting a change to the internalQueryMaxScansToExplode is better advice than asking the user to adjust an internal, undocumented WiredTiger setting.

Comment by Andrey Hohutkin [ 01/Jun/16 ]

No problem. Thanks.

Comment by Kelsey Schubert [ 01/Jun/16 ]

Hi andrey.hohutkin@gmail.com,

Thank you for reporting this issue. So we can continue to investigate, would you please archive (tar or zip) the $dbpath/diagnostic.data directory and attach it to this ticket?

If you continue to encounter this issue, I would suggest that you increase the maximum number of hazard pointers from its default value of 1000. To do so, restart your mongod with --wiredTigerEngineConfigString="hazard_max=10000". This configuration will cause a greater memory overhead, but should resolve the issues you are observing.

Kind regards,
Thomas

Generated at Thu Feb 08 04:06:01 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.