[SERVER-21201] Increase logging severity for Assertion: 28556:Didn't find RecordId in WiredTigerRecordStore Created: 29/Oct/15  Updated: 06/Dec/22  Resolved: 12/Apr/19

Status: Closed
Project: Core Server
Component/s: Logging
Affects Version/s: 3.0.5
Fix Version/s: None

Type: Improvement Priority: Minor - P4
Reporter: Mike Zraly Assignee: Backlog - Storage Execution Team
Resolution: Duplicate Votes: 2
Labels: WTplaybook
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: Text File SERVER-21201.log    
Issue Links:
Related
is related to SERVER-22457 Assertion: 28556:Didn't find RecordId... Closed
is related to SERVER-27038 assertion messages should have correc... Closed
Assigned Teams:
Storage Execution
Participants:
Case:

 Description   

We are experiencing an error where we can find some records with a query on the secondary but not on the primary. We are still working on a reproducible test case for this and will file a bug once we have once.

However.

When a record can't be found the only indication we have in mongodb.log is an assertion log with severity I(nformational):

2015-10-29T01:21:45.869Z I - [conn225391] Assertion: 28556:Didn't find RecordId in WiredTigerRecordStore
 
with an associated stack trace:
 
2015-10-29T01:21:45.877Z I CONTROL [conn225391] 
 0xf77ca9 0xf16b61 0xefbdbf 0xd8ff0c 0x92cb70 0xa1cb0b 0xa2dcd4 0xa2e5c7 0xbeaae2 0xbeb14c 0xbeb47f 0xbbccff 0xbb8736 0xaceb60 0x829dcd 0xf2ac4b 0x7fde2d045df5 0x7fde2baf8bfd 
----- BEGIN BACKTRACE ----- 
{"backtrace":[{"b":"400000","o":"B77CA9"},{"b":"400000","o":"B16B61"},{"b":"400000","o":"AFBDBF"},{"b":"400000","o":"98FF0C"},{"b":"400000","o":"52CB70"},{"b":"400000","o":"61CB0B"},{"b":"400000","o":"62D 
CD4"},{"b":"400000","o":"62E5C7"},{"b":"400000","o":"7EAAE2"},{"b":"400000","o":"7EB14C"},{"b":"400000","o":"7EB47F"},{"b":"400000","o":"7BCCFF"},{"b":"400000","o":"7B8736"},{"b":"400000","o":"6CEB60"},{" 
b":"400000","o":"429DCD"},{"b":"400000","o":"B2AC4B"},{"b":"7FDE2D03E000","o":"7DF5"},{"b":"7FDE2BA02000","o":"F6BFD"}],"processInfo":{ "mongodbVersion" : "3.0.5", "gitVersion" : "8bc4ae20708dbb493cb09338 
d9e7be6698e4a3a3", "uname" : { "sysname" : "Linux", "release" : "4.1.7-15.23.amzn1.x86_64", "version" : "#1 SMP Mon Sep 14 23:20:33 UTC 2015", "machine" : "x86_64" }, "somap" : [ { "elfType" : 2, "b" : "4 
00000", "buildId" : "E8A046532FE2C5A8474250BA8E5B72C6C84B9D68" }, { "b" : "7FFF5D5FC000", "elfType" : 3, "buildId" : "AF79256455DAB4D8F13725CDA5E71975F69FED3F" }, { "b" : "7FDE2D03E000", "path" : "/lib64/ 
libpthread.so.0", "elfType" : 3, "buildId" : "21054340E69FD5F7DDC9DECEF7951CF3F3D2E78E" }, { "b" : "7FDE2CDD1000", "path" : "/usr/lib64/libssl.so.10", "elfType" : 3, "buildId" : "22480480235F3B1C6C2E5E595 
3949728676D3796" }, { "b" : "7FDE2C9EC000", "path" : "/lib64/libcrypto.so.10", "elfType" : 3, "buildId" : "F1C146B78505646930DD9003AA2B3622C5226D1B" }, { "b" : "7FDE2C7E4000", "path" : "/lib64/librt.so.1" 
, "elfType" : 3, "buildId" : "8A6EB7521ABE0518C2803504B52F18412D69C713" }, { "b" : "7FDE2C5E0000", "path" : "/lib64/libdl.so.2", "elfType" : 3, "buildId" : "7030B651A4F5D6F32A8D83F3541990E342C50E6C" }, { 
"b" : "7FDE2C2DC000", "path" : "/usr/lib64/libstdc++.so.6", "elfType" : 3, "buildId" : "DD6383EEAC49E9BAA9E3D1080AE932F42CF8A385" }, { "b" : "7FDE2BFDA000", "path" : "/lib64/libm.so.6", "elfType" : 3, "bu
ildId" : "EB09196C8A859427FD964381CDAC80EC0C4C4296" }, { "b" : "7FDE2BDC4000", "path" : "/lib64/libgcc_s.so.1", "elfType" : 3, "buildId" : "C52958E393BDF8E8D090F36DE0F4E620D8736FBF" }, { "b" : "7FDE2BA020 
00", "path" : "/lib64/libc.so.6", "elfType" : 3, "buildId" : "0EB7F7FA93C9C3E017F7257AE97415730345767B" }, { "b" : "7FDE2D25A000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3, "buildId" : "F451C 
8F279B6860B00F835764FFBA0D54879E1C6" }, { "b" : "7FDE2B7B6000", "path" : "/lib64/libgssapi_krb5.so.2", "elfType" : 3, "buildId" : "FF843C37C38E5BFFD57F7BCCAE05FDADC6390C8F" }, { "b" : "7FDE2B4D3000", "pat 
h" : "/lib64/libkrb5.so.3", "elfType" : 3, "buildId" : "0BB150CC29DB5B0E039879EFC00152A75E3B00B9" }, { "b" : "7FDE2B2D0000", "path" : "/usr/lib64/libcom_err.so.2", "elfType" : 3, "buildId" : "5C01209C5AE1 
B1714F19B07EB58F2A1274B69DC8" }, { "b" : "7FDE2B09E000", "path" : "/lib64/libk5crypto.so.3", "elfType" : 3, "buildId" : "1485992B0E5CDBA0A34817FC8C6A4C45E82CD1A9" }, { "b" : "7FDE2AE88000", "path" : "/lib 
64/libz.so.1", "elfType" : 3, "buildId" : "89C6AF118B6B4FB6A73AE1813E2C8BDD722956D1" }, { "b" : "7FDE2AC79000", "path" : "/lib64/libkrb5support.so.0", "elfType" : 3, "buildId" : "A75A81EC50E9E0164A12B59D9 
987AD61AC7576C8" }, { "b" : "7FDE2AA76000", "path" : "/lib64/libkeyutils.so.1", "elfType" : 3, "buildId" : "37A58210FA50C91E09387765408A92909468D25B" }, { "b" : "7FDE2A85C000", "path" : "/lib64/libresolv. 
so.2", "elfType" : 3, "buildId" : "F586C9A0950DFD60125B2F56B0EC6D690128B1FC" }, { "b" : "7FDE2A63B000", "path" : "/usr/lib64/libselinux.so.1", "elfType" : 3, "buildId" : "C16D759FEE22609796EA66A648C297C1E 
B9191D1" } ] }} 
 mongod(_ZN5mongo15printStackTraceERSo+0x29) [0xf77ca9] 
 mongod(_ZN5mongo10logContextEPKc+0xE1) [0xf16b61] 
 mongod(_ZN5mongo11msgassertedEiPKc+0xAF) [0xefbdbf] 
 mongod(_ZNK5mongo21WiredTigerRecordStore7dataForEPNS_16OperationContextERKNS_8RecordIdE+0xDC) [0xd8ff0c] 
 mongod(_ZNK5mongo10Collection6docForEPNS_16OperationContextERKNS_8RecordIdE+0x20) [0x92cb70] 
 mongod(_ZN5mongo10FetchStage4workEPm+0x2BB) [0xa1cb0b] 
 mongod(_ZN5mongo14MultiPlanStage12workAllPlansEmPNS_15PlanYieldPolicyE+0xE4) [0xa2dcd4] 
 mongod(_ZN5mongo14MultiPlanStage12pickBestPlanEPNS_15PlanYieldPolicyE+0xC7) [0xa2e5c7] 
 mongod(_ZN5mongo12PlanExecutor12pickBestPlanENS0_11YieldPolicyE+0x72) [0xbeaae2] 
 mongod(_ZN5mongo12PlanExecutor4makeEPNS_16OperationContextEPNS_10WorkingSetEPNS_9PlanStageEPNS_13QuerySolutionEPNS_14CanonicalQueryEPKNS_10CollectionERKSsNS0_11YieldPolicyEPPS0_+0x7C) [0xbeb14c] 
 mongod(_ZN5mongo12PlanExecutor4makeEPNS_16OperationContextEPNS_10WorkingSetEPNS_9PlanStageEPNS_13QuerySolutionEPNS_14CanonicalQueryEPKNS_10CollectionENS0_11YieldPolicyEPPS0_+0x7F) [0xbeb47f] 
 mongod(_ZN5mongo11getExecutorEPNS_16OperationContextEPNS_10CollectionEPNS_14CanonicalQueryENS_12PlanExecutor11YieldPolicyEPPS6_m+0xCF) [0xbbccff] 
 mongod(_ZN5mongo8runQueryEPNS_16OperationContextERNS_7MessageERNS_12QueryMessageERKNS_15NamespaceStringERNS_5CurOpES3_+0x666) [0xbb8736] 
 mongod(_ZN5mongo16assembleResponseEPNS_16OperationContextERNS_7MessageERNS_10DbResponseERKNS_11HostAndPortE+0xB10) [0xaceb60] 
 mongod(_ZN5mongo16MyMessageHandler7processERNS_7MessageEPNS_21AbstractMessagingPortEPNS_9LastErrorE+0xDD) [0x829dcd] 
 mongod(_ZN5mongo17PortMessageServer17handleIncomingMsgEPv+0x34B) [0xf2ac4b] 
 libpthread.so.0(+0x7DF5) [0x7fde2d045df5] 
 libc.so.6(clone+0x6D) [0x7fde2baf8bfd] 
----- END BACKTRACE -----

There were no messages with Warning or Error severity to report the failure to find the record. This failure can lead to loss of data, and warrants higher severity, and probably a dedicated error message.



 Comments   
Comment by Eric Milkie [ 12/Apr/19 ]

The message became a fatal error in SERVER-37867.

Comment by Ramon Fernandez Marina [ 03/Feb/16 ]

Thanks for the update mzraly. In your case the assertion is triggered by a distinct() command; do you still use distinct() in your application? If yes then I suspect an interaction with an index build, but this is just a wild guess.

Anyway, I'm going to send this ticket to the right team to consider the improvement to the log level. Feel free to post here again if you see the assertion reappear.

nifan, I'm going to open a separate ticket on your behalf – if you're still seeing this issue there are some options we can explore.

Regards,
Ramón.

Comment by Mike Zraly [ 03/Feb/16 ]

Sorry for the much-delayed reply. We have not reproduced this issue since we upgraded to 3.0.7. I'm afraid I don't have access to the old logs anymore, but I believe with 60% certainty that the secondary where we found the documents we did not see on the primary has once been the primary. We do delete documents though so it's possible that we didn't lose a copy of the record but rather a delete. I'm sorry I don't have any more details to provide.

Comment by nifan [ 15/Dec/15 ]

Sorry we are running:

# apt-cache policy mongodb-org-server
mongodb-org-server:
  Installed: 3.0.7
  Candidate: 3.0.7
  Version table:
 *** 3.0.7 0
        500 http://repo.mongodb.org/apt/debian/ wheezy/mongodb-org/3.0/main amd64 Packages
        100 /var/lib/dpkg/status
...

Comment by Kelsey Schubert [ 15/Dec/15 ]

Hi nifan,

It would help if you could provide us with a few more details.

  • With which versions of MongoDB are you experiencing this behavior?
  • Can you provide the same full logs for the node that includes the message you have posted?

Thank you,
Thomas

Comment by nifan [ 14/Dec/15 ]

We seem to have the same problem.

2015-12-14T10:26:04.776+0100 I -        [conn267438] Assertion: 28556:Didn't find RecordId in WiredTigerRecordStore
2015-12-14T10:26:04.783+0100 I CONTROL  [conn267438]
 0xf785c9 0xf17401 0xefc65f 0xd9075c 0x92cbe0 0xa1d42b 0xbeb084 0xbeb434 0xbb99b4 0xacf8d0 0x82a0bd 0xf2b46b 0x7f379b8b2b50 0x7f379a3f895d
----- BEGIN BACKTRACE -----
{"backtrace":[{"b":"400000","o":"B785C9"},{"b":"400000","o":"B17401"},{"b":"400000","o":"AFC65F"},{"b":"400000","o":"99075C"},{"b":"400000","o":"52CBE0"},{"b":"400000","o":"61D42B"},{"b":"400000","o":"7EB084"},{"b":"400000","o":"7EB434"},{"b":"400000","o":"7B99B4"},{"b":"400000","o":"6CF8D0"},{"b":"400000","o":"42A0BD"},{"b":"400000","o":"B2B46B"},{"b":"7F379B8AC000","o":"6B50"},{"b":"7F379A31D000","o":"DB95D"}],"processInfo":{ "mongodbVersion" : "3.0.7", "gitVersion" : "6ce7cbe8c6b899552dadd907604559806aa2e9bd", "uname" : { "sysname" : "Linux", "release" : "3.2.0-4-amd64", "version" : "#1 SMP Debian 3.2.68-1+deb7u6", "machine" : "x86_64" }, "somap" : [ { "elfType" : 2, "b" : "400000", "buildId" : "0520DC2A91D540A37F4E7859EB3FFED4590CBFCD" }, { "b" : "7FFDD34EC000", "elfType" : 3, "buildId" : "9DFFE775123E377BD7ED77E6ADF00E11CE8EF9FE" }, { "b" : "7F379B8AC000", "path" : "/lib/x86_64-linux-gnu/libpthread.so.0", "elfType" : 3, "buildId" : "FEF281218797AD6AE726DD5FCEDECADD9E9F51DC" }, { "b" : "7F379B64B000", "path" : "/usr/lib/x86_64-linux-gnu/libssl.so.1.0.0", "elfType" : 3, "buildId" : "B3CDB575971D8B696AC6A6F8872620CAC9D2D273" }, { "b" : "7F379B253000", "path" : "/usr/lib/x86_64-linux-gnu/libcrypto.so.1.0.0", "elfType" : 3, "buildId" : "E75BB3082373F77A79E8030BDC6C6A280284E759" }, { "b" : "7F379B04B000", "path" : "/lib/x86_64-linux-gnu/librt.so.1", "elfType" : 3, "buildId" : "F58D5DE3E7A2989E915422BA4203FE53DBA449A0" }, { "b" : "7F379AE47000", "path" : "/lib/x86_64-linux-gnu/libdl.so.2", "elfType" : 3, "buildId" : "5D1CA3A3D93ED5B6C6462FFA03E787FDBE4013A3" }, { "b" : "7F379AB40000", "path" : "/usr/lib/x86_64-linux-gnu/libstdc++.so.6", "elfType" : 3, "buildId" : "8711429397A5AF8B6269B867D830EDF6E0225B8D" }, { "b" : "7F379A8BE000", "path" : "/lib/x86_64-linux-gnu/libm.so.6", "elfType" : 3, "buildId" : "7F58D6664571941C86B2D969701A572AD4D7BF1D" }, { "b" : "7F379A6A8000", "path" : "/lib/x86_64-linux-gnu/libgcc_s.so.1", "elfType" : 3, "buildId" : "F980B1188708F8D8B5C35D185444AF4CB939AA1E" }, { "b" : "7F379A31D000", "path" : "/lib/x86_64-linux-gnu/libc.so.6", "elfType" : 3, "buildId" : "A745EBA2C16BA80AE1EF1A7A7B70740C2CF1B363" }, { "b" : "7F379BAC8000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3, "buildId" : "9B23F2A44CC8CA6175CBD8D64584B1C7EA5FD18C" }, { "b" : "7F379A106000", "path" : "/lib/x86_64-linux-gnu/libz.so.1", "elfType" : 3, "buildId" : "1EFEB71FD4999C2307570D673A724EA4E1D85267" } ] }}
 mongod(_ZN5mongo15printStackTraceERSo+0x29) [0xf785c9]
 mongod(_ZN5mongo10logContextEPKc+0xE1) [0xf17401]
 mongod(_ZN5mongo11msgassertedEiPKc+0xAF) [0xefc65f]
 mongod(_ZNK5mongo21WiredTigerRecordStore7dataForEPNS_16OperationContextERKNS_8RecordIdE+0xDC) [0xd9075c]
 mongod(_ZNK5mongo10Collection6docForEPNS_16OperationContextERKNS_8RecordIdE+0x20) [0x92cbe0]
 mongod(_ZN5mongo10FetchStage4workEPm+0x2BB) [0xa1d42b]
 mongod(_ZN5mongo12PlanExecutor18getNextSnapshottedEPNS_11SnapshottedINS_7BSONObjEEEPNS_8RecordIdE+0xA4) [0xbeb084]
 mongod(_ZN5mongo12PlanExecutor7getNextEPNS_7BSONObjEPNS_8RecordIdE+0x34) [0xbeb434]
 mongod(_ZN5mongo8runQueryEPNS_16OperationContextERNS_7MessageERNS_12QueryMessageERKNS_15NamespaceStringERNS_5CurOpES3_+0xA74) [0xbb99b4]
 mongod(_ZN5mongo16assembleResponseEPNS_16OperationContextERNS_7MessageERNS_10DbResponseERKNS_11HostAndPortE+0xB10) [0xacf8d0]
 mongod(_ZN5mongo16MyMessageHandler7processERNS_7MessageEPNS_21AbstractMessagingPortEPNS_9LastErrorE+0xDD) [0x82a0bd]
 mongod(_ZN5mongo17PortMessageServer17handleIncomingMsgEPv+0x34B) [0xf2b46b]
 libpthread.so.0(+0x6B50) [0x7f379b8b2b50]
 libc.so.6(clone+0x6D) [0x7f379a3f895d]
-----  END BACKTRACE  -----
2015-12-14T10:26:04.783+0100 I QUERY    [conn267438] assertion 28556 Didn't find RecordId in WiredTigerRecordStore ns:audiences.audiences query:{ next_update: { $lte: new Date(1450085164768) } }
2015-12-14T10:26:04.783+0100 I QUERY    [conn267438]  ntoskip:0 ntoreturn:1000

Comment by Ramon Fernandez Marina [ 04/Dec/15 ]

mzraly, we're discussing internally how you may have arrived to this situation. Have you been able to reproduce this again? Is there any other information about your deployment that you can provide that can help the investigation? For example, the redacted logs do not allow us to see which node became primary when, so when you mention you found documents on a secondary that were not there on a primary, was that secondary node a primary before?

Thanks,
Ramón.

Comment by Mike Zraly [ 06/Nov/15 ]

Redacted log file containing several examples of Assertion 28556. It looks like there were some early cases where a query was made to the instance before background index creation for the appropriate query index completed. However there are many instances of this assertion after the index creation completed for all collections.

Comment by Ramon Fernandez Marina [ 29/Oct/15 ]

Thanks for your report mzraly. Could you please upload full logs for this node that include the messages you included in the description? This will help us put this particular assertion in context.

Thanks,
Ramón.

Generated at Thu Feb 08 03:56:37 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.