[SERVER-28316] Related to SERVER-27753 - Custom build crashes. Created: 15/Mar/17  Updated: 31/May/17  Resolved: 27/Apr/17

Status: Closed
Project: Core Server
Component/s: WiredTiger
Affects Version/s: 3.2.11
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Luke Morfitt Assignee: Kelsey Schubert
Resolution: Duplicate Votes: 0
Labels: crash, custom, mongodb
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
duplicates WT-3218 unexpected checkpoint ordering failures Closed
Related
related to SERVER-27700 WT secondary performance drops to nea... Closed
related to SERVER-27753 wiredTiger cache 95% performance issues Closed
Operating System: ALL
Participants:

 Description   

Hello,

We have seen two crash's of the custom build provided on SERVER-27753 Whilst this is not a current production issue I thought it was prudent to report this issue with the custom build before it makes it into a production release.

Mar 15 06:44:04 gsp-mew-ma03 mongod.27017[37607]: [WTJournalFlusher] Fatal Assertion 28559
Mar 15 06:44:04 gsp-mew-ma03 mongod.27017[37607]: [WTJournalFlusher] #012#012***aborting after fassert() failure#012#012
Mar 15 06:44:04 gsp-mew-ma03 mongod.27017[37607]: [thread1] Got signal: 6 (Aborted).#012#012 0x132d7e2 0x132c939 0x132d142 0x7fd5d779b340 0x7fd5d73fccc9 0x7fd5d74000d8 0x12b3ac2 0x10ada13 0x969c51 0x969d26 0x969eea 0x968106 0x19bf357 0x19def83 0x1a785d1 0x1a7fc12 0x19f6715 0x1aaa399 0x1aaa6b5 0x96a128 0x1aaacd3 0x1aac5cb 0x1a99d7d 0x1a1529d 0x7fd5d7793182 0x7fd5d74c047d#012



 Comments   
Comment by Luke Morfitt [ 16/Mar/17 ]

Thank you for the update.

On a side node the custom build took around 3 hours to start on one of the clusters(has 18,000 databases) vs 3.2.11 which usually starts in 45-60 mins.

Will test out 3.4rc when it is available and provide feedback.

Thank you for your help,
Luke

Comment by Ramon Fernandez Marina [ 16/Mar/17 ]

lukemorfitt, the current plan is to roll these changes in a 3.4.3-rc2 release candidate that you'll be able to download and test soon. This will provide you with the best path to transition to using a stable release (3.4.3).

Comment by Luke Morfitt [ 16/Mar/17 ]

Hello Michael Cahill,

Please can you provide a new build for ubuntu 14.04. The change in cache performance has significant benefits for us and we would like to try out a new custom build.

Kind regards,
Luke

Comment by Michael Cahill (Inactive) [ 16/Mar/17 ]

lukemorfitt, thank you for this bug report. We had seen this issue in WT-3218 in testing but until we heard from you, we didn't realize it was introduced by one of the changes in the patch build.

There is no need to upload further information.

We fixed WT-3218 today and will roll that fix into future versions of MongoDB. If you are keen to run further tests with a patch build that includes the fix, please let me know.

Comment by Kelsey Schubert [ 15/Mar/17 ]

Hi lukemorfitt,

Thanks for uploading the diagnostic.data and log files. So we can continue to investigate, would you please upload the following files from gsp-mew?

  • 302667.45game.45live/index/4434--6007633730952334874.wt
  • WiredTiger.wt
  • WiredTiger.turtle

This will enable us to examine what checkpoints are in the file and the meta-data files will let us know what WiredTiger expects to be in the file.

Thank you,
Thomas

Comment by Luke Morfitt [ 15/Mar/17 ]

Hello.

I've uploaded two tar balls from different clusters which have experienced these issues.

gsp-aeu001 has a small number of databases (relative to the other) and was accidently promoted to master. We was using it as a backup node with mongo dump.

gsp-mew is our analytics cluster so has a very large number of databases and collections. We added the custom build to try and perform a large export of data, which (on 3.2.11) will grind to a halt when the cache becomes full.

Kind regards,
Luke

Comment by Kelsey Schubert [ 15/Mar/17 ]

Hi lukemorfitt,

Thank you for reporting this behavior. I've created a secure upload portal for the complete log files and diagnostic.data.

Thanks again,
Thomas

Comment by Luke Morfitt [ 15/Mar/17 ]

These are the logs from the other failed server.

Mar 15 06:44:04 gsp-mew-ma03 mongod.27017[37607]: [initandlisten] connection accepted from 127.0.0.1:34226 #116669 (79 connections now open)
Mar 15 06:44:04 gsp-mew-ma03 mongod.27017[37607]: [thread1] WiredTiger (22) [1489560244:523869][37607:0x7fd5d2bbc700], file:302667.45game.45live/index/4434--6007633730952334874.wt, WT_SESSION.checkpoint: 302667.45game.45live/index/4434--6007633730952334874.wt: unexpected checkpoint ordering: Invalid argument
Mar 15 06:44:04 gsp-mew-ma03 mongod.27017[37607]: [thread1] WiredTiger (-31804) [1489560244:523957][37607:0x7fd5d2bbc700], file:302667.45game.45live/index/4434--6007633730952334874.wt, WT_SESSION.checkpoint: the process must exit and restart: WT_PANIC: WiredTiger library panic
Mar 15 06:44:04 gsp-mew-ma03 mongod.27017[37607]: [thread1] Fatal Assertion 28558
Mar 15 06:44:04 gsp-mew-ma03 mongod.27017[37607]: [repl writer worker 14] Fatal Assertion 28559
Mar 15 06:44:04 gsp-mew-ma03 mongod.27017[37607]: [repl writer worker 2] Fatal Assertion 28559
Mar 15 06:44:04 gsp-mew-ma03 mongod.27017[37607]: [repl writer worker 3] Fatal Assertion 28559
Mar 15 06:44:04 gsp-mew-ma03 mongod.27017[37607]: [repl writer worker 12] Fatal Assertion 28559
Mar 15 06:44:04 gsp-mew-ma03 mongod.27017[37607]: [thread1] #012#012***aborting after fassert() failure#012#012
Mar 15 06:44:04 gsp-mew-ma03 mongod.27017[37607]: [repl writer worker 3] #012#012***aborting after fassert() failure#012#012
Mar 15 06:44:04 gsp-mew-ma03 mongod.27017[37607]: [repl writer worker 2] #012#012***aborting after fassert() failure#012#012
Mar 15 06:44:04 gsp-mew-ma03 mongod.27017[37607]: [repl writer worker 5] Fatal Assertion 28559
Mar 15 06:44:04 gsp-mew-ma03 mongod.27017[37607]: [repl writer worker 12] #012#012***aborting after fassert() failure#012#012
Mar 15 06:44:04 gsp-mew-ma03 mongod.27017[37607]: [repl writer worker 4] Fatal Assertion 28559
Mar 15 06:44:04 gsp-mew-ma03 mongod.27017[37607]: [repl writer worker 6] Fatal Assertion 28559
Mar 15 06:44:04 gsp-mew-ma03 mongod.27017[37607]: [repl writer worker 5] #012#012***aborting after fassert() failure#012#012
Mar 15 06:44:04 gsp-mew-ma03 mongod.27017[37607]: [repl writer worker 8] Fatal Assertion 28559
Mar 15 06:44:04 gsp-mew-ma03 mongod.27017[37607]: [repl writer worker 7] Fatal Assertion 28559
Mar 15 06:44:04 gsp-mew-ma03 mongod.27017[37607]: [repl writer worker 8] #012#012***aborting after fassert() failure#012#012
Mar 15 06:44:04 gsp-mew-ma03 mongod.27017[37607]: [repl writer worker 14] #012#012***aborting after fassert() failure#012#012
Mar 15 06:44:04 gsp-mew-ma03 mongod.27017[37607]: [repl writer worker 7] #012#012***aborting after fassert() failure#012#012
Mar 15 06:44:04 gsp-mew-ma03 mongod.27017[37607]: [repl writer worker 0] Fatal Assertion 28559
Mar 15 06:44:04 gsp-mew-ma03 mongod.27017[37607]: [repl writer worker 1] Fatal Assertion 28559
Mar 15 06:44:04 gsp-mew-ma03 mongod.27017[37607]: [repl writer worker 10] Fatal Assertion 28559
Mar 15 06:44:04 gsp-mew-ma03 mongod.27017[37607]: [repl writer worker 1] #012#012***aborting after fassert() failure#012#012
Mar 15 06:44:04 gsp-mew-ma03 mongod.27017[37607]: [repl writer worker 6] #012#012***aborting after fassert() failure#012#012
Mar 15 06:44:04 gsp-mew-ma03 mongod.27017[37607]: [repl writer worker 4] #012#012***aborting after fassert() failure#012#012
Mar 15 06:44:04 gsp-mew-ma03 mongod.27017[37607]: [repl writer worker 13] Fatal Assertion 28559
Mar 15 06:44:04 gsp-mew-ma03 mongod.27017[37607]: [repl writer worker 10] #012#012***aborting after fassert() failure#012#012
Mar 15 06:44:04 gsp-mew-ma03 mongod.27017[37607]: [repl writer worker 0] #012#012***aborting after fassert() failure#012#012
Mar 15 06:44:04 gsp-mew-ma03 mongod.27017[37607]: [repl writer worker 15] Fatal Assertion 28559
Mar 15 06:44:04 gsp-mew-ma03 mongod.27017[37607]: [repl writer worker 15] #012#012***aborting after fassert() failure#012#012
Mar 15 06:44:04 gsp-mew-ma03 mongod.27017[37607]: [repl writer worker 11] Fatal Assertion 28559
Mar 15 06:44:04 gsp-mew-ma03 mongod.27017[37607]: [repl writer worker 11] #012#012***aborting after fassert() failure#012#012
Mar 15 06:44:04 gsp-mew-ma03 mongod.27017[37607]: [repl writer worker 13] #012#012***aborting after fassert() failure#012#012
Mar 15 06:44:04 gsp-mew-ma03 mongod.27017[37607]: [repl writer worker 9] Fatal Assertion 28559
Mar 15 06:44:04 gsp-mew-ma03 mongod.27017[37607]: [repl writer worker 9] #012#012***aborting after fassert() failure#012#012
Mar 15 06:44:04 gsp-mew-ma03 mongod.27017[37607]: [WTJournalFlusher] Fatal Assertion 28559
Mar 15 06:44:04 gsp-mew-ma03 mongod.27017[37607]: [WTJournalFlusher] #012#012***aborting after fassert() failure#012#012
Mar 15 06:44:04 gsp-mew-ma03 mongod.27017[37607]: [thread1] Got signal: 6 (Aborted).#012#012 0x132d7e2 0x132c939 0x132d142 0x7fd5d779b340 0x7fd5d73fccc9 0x7fd5d74000d8 0x12b3ac2 0x10ada13 0x969c51 0x969d26 0x969eea 0x968106 0x19bf357 0x19def83 0x1a785d1 0x1a7fc12 0x19f6715 0x1aaa399 0x1aaa6b5 0x96a128 0x1aaacd3 0x1aac5cb 0x1a99d7d 0x1a1529d 0x7fd5d7793182 0x7fd5d74c047d#012----- BEGIN BACKTRACE -----#012{"backtrace":[{"b":"400000","o":"F2D7E2","s":"_ZN5mongo15printStackTraceERSo"},{"b":"400000","o":"F2C939"},{"b":"400000","o":"F2D142"},{"b":"7FD5D778B000","o":"10340"},{"b":"7FD5D73C6000","o":"36CC9","s":"gsignal"},{"b":"7FD5D73C6000","o":"3A0D8","s":"abort"},{"b":"400000","o":"EB3AC2","s":"_ZN5mongo13fassertFailedEi"},{"b":"400000","o":"CADA13"},{"b":"400000","o":"569C51"},{"b":"400000","o":"569D26","s":"__wt_err"},{"b":"400000","o":"569EEA","s":"__wt_panic"},{"b":"400000","o":"568106"},{"b":"400000","o":"15BF357"},{"b":"400000","o":"15DEF83"},{"b":"400000","o":"16785D1"},{"b":"400000","o":"167FC12"},{"b":"400000","o":"15F6715"},{"b":"400000","o":"16AA399"},{"b":"400000","o":"16AA6B5"},{"b":"400000","o":"56A128"},{"b":"400000","o":"16AACD3"},{"b":"400000","o":"16AC5CB"},{"b":"400000","o":"1699D7D"},{"b":"400000","o":"161529D"},{"b":"7FD5D778B000","o":"8182"},{"b":"7FD5D73C6000","o":"FA47D","s":"clone"}],"processInfo":{ "mongodbVersion" : "3.2.12-29-g95d4a0a-patch-58aa738b3ff12233d00001b2", "gitVersion" : "95d4a0a472e436dad9380db917145f121d8c3720", "compiledModules" : [], "uname" : { "sysname" : "Linux", "release" : "3.16.0-60-generic", "version" : "#80~14.04.1-Ubuntu SMP Wed Jan 20 13:37:48 UTC 2016", "machine" : "x86_64" }, "somap" : [ { "elfType" : 2, "b" : "400000", "buildId" : "D48E4C166021CD433043C476470DB400ED98C7C7" }, { "b" : "7FFC80FE3000", "elfType" : 3, "buildId" : "E909F3BEA8B99B483153329FF4F90B288A068655" }, { "b" : "7FD5D86AD000", "path" : "/lib/x86_64-linux-gnu/libssl.so.1.0.0", "elfType" : 3, "buildId" : "74864DB9D5F69D39A67E4755012FB6573C469B3D" }, { "b" : "7FD5D82D1000", "path" : "/lib/x86_64-linux-gnu/libcrypto.so.1.0.0", "elfType" : 3, "buildId" : "AAE7CFF8351B730830BDBCE0DCABBE06574B7144" }, { "b" : "7FD5D80C9000", "path" : "/lib/x86_64-linux-gnu/librt.so.1", "elfType" : 3, "buildId" : "B376100CAB1EAC4E5DE066EACFC282BF7C0B54F3" }, { "b" : "7FD5D7EC5000", "path" : "/lib/x86_64-linux-gnu/libdl.so.2", "elfType" : 3, "buildId" : "67699FFDA9FD2A552032E0652A242E82D65AA10D" }, { "b" : "7FD5D7BBF000", "path" : "/lib/x86_64-linux-gnu/libm.so.6", "elfType" : 3, "buildId" : "EF3F6DFFA1FBE48436EC6F45CD3AABA157064BB4" }, { "b" : "7FD5D79A9000", "path" : "/lib/x86_64-linux-gnu/libgcc_s.so.1", "elfType" : 3, "buildId" : "36311B4457710AE5578C4BF00791DED7359DBB92" }, { "b" : "7FD5D778B000", "path" : "/lib/x86_64-linux-gnu/libpthread.so.0", "elfType" : 3, "buildId" : "AF06068681750736E0524DF17D5A86CB2C3F765C" }, { "b" : "7FD5D73C6000", "path" : "/lib/x86_64-linux-gnu/libc.so.6", "elfType" : 3, "buildId" : "5382058B69031CAA9B9996C11061CD164C9398FF" }, { "b" : "7FD5D890C000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3, "buildId" : "2A816C3EBBA4E12813FBD34B06FBD25BC892A67F" } ] }}#012 mongod(_ZN5mongo15printStackTraceERSo+0x32) [0x132d7e2]#012 mongod(+0xF2C939) [0x132c939]#012 mongod(+0xF2D142) [0x132d142]#012 libpthread.so.0(+0x10340) [0x7fd5d779b340]#012 libc.so.6(gsignal+0x39) [0x7fd5d73fccc9]#012 libc.so.6(abort+0x148) [0x7fd5d74000d8]#012 mongod(_ZN5mongo13fassertFailedEi+0x82) [0x12b3ac2]#012 mongod(+0xCADA13) [0x10ada13]#012 mongod(+0x569C51) [0x969c51]#012 mongod(__wt_err+0x8D) [0x969d26]#012 mongod(__wt_panic+0x24) [0x969eea]#012 mongod(+0x568106) [0x968106]#012 mongod(+0x15BF357) [0x19bf357]#012 mongod(+0x15DEF83) [0x19def83]#012 mongod(+0x16785D1) [0x1a785d1]#012 mongod(+0x167FC12) [0x1a7fc12]#012 mongod(+0x15F6715) [0x19f6715]#012 mongod(+0x16AA399) [0x1aaa399]#012 mongod(+0x16AA6B5) [0x1aaa6b5]#012 mongod(+0x56A128) [0x96a128]#012 mongod(+0x16AACD3) [0x1aaacd3]#012 mongod(+0x16AC5CB) [0x1aac5cb]#012 mongod(+0x1699D7D) [0x1a99d7d]#012 mongod(+0x161529D) [0x1a1529d]#012 libpthread.so.0(+0x8182) [0x7fd5d7793182]#012 libc.so.6(clone+0x6D) [0x7fd5d74c047d]#012-----  END BACKTRACE  -----

Comment by Luke Morfitt [ 15/Mar/17 ]

The issue is for the custom build version " mongo-mongodb_mongo_v3.2_ubuntu1404_95d4a0a472e436dad9380db917145f121d8c3720_17_02_20_04_43_56"

Mar 13 17:26:08 gsp-aeu001-mo04 mongod.27017[27428]: [rsBackgroundSync] rollback 5 d:1640 u:2233
Mar 13 17:26:08 gsp-aeu001-mo04 mongod.27017[27428]: [rsBackgroundSync] rollback 6
Mar 13 17:26:08 gsp-aeu001-mo04 mongod.27017[27428]: [rsBackgroundSync] rollback done
Mar 13 17:26:08 gsp-aeu001-mo04 mongod.27017[27428]: [SyncSourceFeedback] Socket say send() errno:9 Bad file descriptor a.b.c.26:27017
Mar 13 17:26:08 gsp-aeu001-mo04 mongod.27017[27428]: [rsBackgroundSync] rollback finished
Mar 13 17:26:08 gsp-aeu001-mo04 mongod.27017[27428]: [ReplicationExecutor] transition to RECOVERING
Mar 13 17:26:08 gsp-aeu001-mo04 mongod.27017[27428]: [ReplicationExecutor] syncing from: server.name:27017
Mar 13 17:26:08 gsp-aeu001-mo04 mongod.27017[27428]: [SyncSourceFeedback] SyncSourceFeedback error sending update: socket exception [SEND_ERROR] for a.b.c.26:27017
Mar 13 17:26:08 gsp-aeu001-mo04 mongod.27017[27428]: [SyncSourceFeedback] updateUpstream failed: Location9001: socket exception [SEND_ERROR] for a.b.c.26:27017, will retry
Mar 13 17:26:08 gsp-aeu001-mo04 mongod.27017[27428]: [ReplicationExecutor] syncing from: server.name:27017
Mar 13 17:26:08 gsp-aeu001-mo04 mongod.27017[27428]: [SyncSourceFeedback] setting syncSourceFeedback to server.name:27017
Mar 13 17:26:08 gsp-aeu001-mo04 mongod.27017[27428]: [NetworkInterfaceASIO-BGSync-0] Connecting to server.name:27017
Mar 13 17:26:08 gsp-aeu001-mo04 mongod.27017[27428]: [NetworkInterfaceASIO-BGSync-0] Successfully connected to server.name:27017
Mar 13 17:26:08 gsp-aeu001-mo04 mongod.27017[27428]: [initandlisten] connection accepted from a.b.c.60:49914 #347 (7 connections now open)
Mar 13 17:26:08 gsp-aeu001-mo04 mongod.27017[27428]: [conn347] end connection a.b.c.60:49914 (6 connections now open)
Mar 13 17:26:08 gsp-aeu001-mo04 mongod.27017[27428]: [initandlisten] connection accepted from a.b.c.60:49915 #348 (7 connections now open)
Mar 13 17:26:08 gsp-aeu001-mo04 mongod.27017[27428]: [initandlisten] connection accepted from a.b.c.60:49916 #349 (8 connections now open)
Mar 13 17:26:08 gsp-aeu001-mo04 mongod.27017[27428]: [repl writer worker 10] writer worker caught exception:  :: caused by :: 16755 Can't extract geo keys: { REDACTED }
Mar 13 17:26:08 gsp-aeu001-mo04 mongod.27017[27428]: [repl writer worker 10] Fatal Assertion 16360
Mar 13 17:26:08 gsp-aeu001-mo04 mongod.27017[27428]: [repl writer worker 10] #012#012***aborting after fassert() failure#012#012

Please provide a secure upload location and I will send full log files and diagnostic data for the servers.

Thanks,
Luke

Generated at Thu Feb 08 04:17:46 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.