[SERVER-16117] Segfault in prefetch worker when replicating an insert that creates a database implicitly Created: 12/Nov/14  Updated: 11/Jul/16  Resolved: 14/Nov/14

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: 2.8.0-rc0
Fix Version/s: 2.8.0-rc1

Type: Bug Priority: Major - P3
Reporter: Charlie Swanson Assignee: Matt Dannenberg
Resolution: Done Votes: 0
Labels: 28qa
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File git.diff     Text File testOutput.txt    
Issue Links:
Depends
Operating System: ALL
Steps To Reproduce:

I attached the diff on my branch. Just added a test and changed one other file which allows the upgrade to actually upgrade (otherwise it restarts with 2.6 again). Add the one change and the test, and run the test.

Participants:

 Description   

When I run the attached test, the attached log results. In there is a seg fault after restarting a 2.6 node as a 2.8 node.

segfault:

m31001| 2014-11-12T17:15:19.812-0500 I REPLSETS [ReplicationExecutor] transition to RECOVERING
 m31001| 2014-11-12T17:15:19.813-0500 I REPLSETS [ReplicationExecutor] transition to SECONDARY
 m31001| 2014-11-12T17:15:19.813-0500 F -        [repl prefetch worker 15] Invalid access at address: 0
 m31001| 2014-11-12T17:15:19.816-0500 F -        [repl prefetch worker 15] Got signal: 11 (Segmentation fault: 11).
 m31001| 
 m31001|  0x10799012a 0x10798fbb0 0x7fff8f8f1f1a 0x108317350 0x1074efbcf 0x107680494 0x107922a12 0x1079d1571 0x7fff8981f2fc 0x7fff8981f279 0x7fff8981d4b1
 m31001| ----- BEGIN BACKTRACE -----
 m31001| {"backtrace":[{"b":"107086000","o":"90A12A"},{"b":"107086000","o":"909BB0"},{"b":"7FFF8F8ED000","o":"4F1A"},{"b":"107086000","o":"1291350"},{"b":"107086000","o":"469BCF"},{"b":"107086000","o":"5FA494"},{"b":"107086000","o":"89CA12"},{"b":"107086000","o":"94B571"},{"b":"7FFF8981C000","o":"32FC"},{"b":"7FFF8981C000","o":"3279"},{"b":"7FFF8981C000","o":"14B1"}],"processInfo":{ "mongodbVersion" : "2.7.9-pre-", "gitVersion" : "04869a65202864514e6066ef77798a39c7c4920d", "uname" : { "sysname" : "Darwin", "release" : "14.0.0", "version" : "Darwin Kernel Version 14.0.0: Fri Sep 19 00:26:44 PDT 2014; root:xnu-2782.1.97~2/RELEASE_X86_64", "machine" : "x86_64" }, "somap" : [ { "path" : "/Users/charlie/github/mongo/mongod", "machType" : 2, "b" : "107086000", "buildId" : "E1D66906875A3E53BE2B8B8BE4E26687" }, { "path" : "/usr/lib/libSystem.B.dylib", "machType" : 6, "b" : "7FFF96B38000", "buildId" : "DA954461EC6A3DF085516FC810627627" }, { "path" : "/usr/lib/libstdc++.6.dylib", "machType" : 6, "b" : "7FFF9677E000", "buildId" : "803F6AC887DC3E249E80729B551F6FFF" }, { "path" : "/usr/lib/system/libcache.dylib", "machType" : 6, "b" : "7FFF89B38000", "buildId" : "45E9A2E799C436B2BEE30C4E11614AD1" }, { "path" : "/usr/lib/system/libcommonCrypto.dylib", "machType" : 6, "b" : "7FFF922AA000", "buildId" : "D381EBC669D831D380845A80A32CB748" }, { "path" : "/usr/lib/system/libcompiler_rt.dylib", "machType" : 6, "b" : "7FFF941AD000", "buildId" : "BF8FC133EE103DA69B9092039E28678F" }, { "path" : "/usr/lib/system/libcopyfile.dylib", "machType" : 6, "b" : "7FFF95623000", "buildId" : "0C68D3A6ACDD3EF3991ACC82C32AB836" }, { "path" : "/usr/lib/system/libcorecrypto.dylib", "machType" : 6, "b" : "7FFF91A37000", "buildId" : "E178980139853949B7366B3378873301" }, { "path" : "/usr/lib/system/libdispatch.dylib", "machType" : 6, "b" : "7FFF941C4000", "buildId" : "502CF32B669B3709886208188225E4F0" }, { "path" : "/usr/lib/system/libdyld.dylib", "machType" : 6, "b" : "7FFF8AF81000", "buildId" : "19FAF435C16533749DEFD7BBA7D61DB6" }, { "path" : "/usr/lib/system/libkeymgr.dylib", "machType" : 6, "b" : "7FFF8D912000", "buildId" : "77845842DE703CC5BD01C3D14227CED5" }, { "path" : "/usr/lib/system/liblaunch.dylib", "machType" : 6, "b" : "7FFF9562E000", "buildId" : "8A9889248BE735FEBF7D322E90EFE49E" }, { "path" : "/usr/lib/system/libmacho.dylib", "machType" : 6, "b" : "7FFF96B32000", "buildId" : "126CA2EDDE91308F8881B9DAEC3C63B6" }, { "path" : "/usr/lib/system/libquarantine.dylib", "machType" : 6, "b" : "7FFF8D1B3000", "buildId" : "DC0416272D92361CBABFA869A5C72293" }, { "path" : "/usr/lib/system/libremovefile.dylib", "machType" : 6, "b" : "7FFF9417C000", "buildId" : "3485B5F46CE83C628DFD8736ED6E8531" }, { "path" : "/usr/lib/system/libsystem_asl.dylib", "machType" : 6, "b" : "7FFF8F8D6000", "buildId" : "F153AC5B0542356E88C820A62CA704E2" }, { "path" : "/usr/lib/system/libsystem_blocks.dylib", "machType" : 6, "b" : "7FFF8D6DA000", "buildId" : "9615D10AFCA73BE4AA1A1B195DACE1A1" }, { "path" : "/usr/lib/system/libsystem_c.dylib", "machType" : 6, "b" : "7FFF97F65000", "buildId" : "C185E86274243210B5286B822577A4B8" }, { "path" : "/usr/lib/system/libsystem_configuration.dylib", "machType" : 6, "b" : "7FFF8FB41000", "buildId" : "9FBA1CE497D0347EA44393ED94512E92" }, { "path" : "/usr/lib/system/libsystem_coreservices.dylib", "machType" : 6, "b" : "7FFF912F5000", "buildId" : "41B7C5785A5331C8A96FC73E030B0938" }, { "path" : "/usr/lib/system/libsystem_coretls.dylib", "machType" : 6, "b" : "7FFF8998D000", "buildId" : "EBBF7EF680D83F8F825CB412BD6D22C0" }, { "path" : "/usr/lib/system/libsystem_dnssd.dylib", "machType" : 6, "b" : "7FFF98DAD000", "buildId" : "62B70ECAE40D3C63896E7F00EC386DDB" }, { "path" : "/usr/lib/system/libsystem_info.dylib", "machType" : 6, "b" : "7FFF936B2000", "buildId" : "B85A85D585303A93B0C34DEC41F79478" }, { "path" : "/usr/lib/system/libsystem_kernel.dylib", "machType" : 6, "b" : "7FFF93107000", "buildId" : "93E0E0A975B63904BB4E4BC7C05F4B6B" }, { "path" : "/usr/lib/system/libsystem_m.dylib", "machType" : 6, "b" : "7FFF8C851000", "buildId" : "1E12AB456D9636D0A226F24D9FB0D9D6" }, { "path" : "/usr/lib/system/libsystem_malloc.dylib", "machType" : 6, "b" : "7FFF922C7000", "buildId" : "19BCC25757173502A71F95D65AFA861B" }, { "path" : "/usr/lib/system/libsystem_network.dylib", "machType" : 6, "b" : "7FFF9116E000", "buildId" : "C0B2313D47BE38A9BEE62634A4F5E14B" }, { "path" : "/usr/lib/system/libsystem_networkextension.dylib", "machType" : 6, "b" : "7FFF92CA1000", "buildId" : "29AB225BD7FB30ED960065D44B9A9442" }, { "path" : "/usr/lib/system/libsystem_notify.dylib", "machType" : 6, "b" : "7FFF8BDAE000", "buildId" : "61147800F3203DAA850CBADF33855F29" }, { "path" : "/usr/lib/system/libsystem_platform.dylib", "machType" : 6, "b" : "7FFF8F8ED000", "buildId" : "64E34079D7123D669CE2418624A5C040" }, { "path" : "/usr/lib/system/libsystem_pthread.dylib", "machType" : 6, "b" : "7FFF8981C000", "buildId" : "26B1897F0CD330F3B55A37CB45062D73" }, { "path" : "/usr/lib/system/libsystem_sandbox.dylib", "machType" : 6, "b" : "7FFF901BC000", "buildId" : "DB9962EF889831CC9B87E01F8CE74C9D" }, { "path" : "/usr/lib/system/libsystem_secinit.dylib", "machType" : 6, "b" : "7FFF98C79000", "buildId" : "581DAD0F6B633A48B63B917AF799ABAA" }, { "path" : "/usr/lib/system/libsystem_stats.dylib", "machType" : 6, "b" : "7FFF94333000", "buildId" : "1DB0443659743F1686CC5FF5F390339C" }, { "path" : "/usr/lib/system/libsystem_trace.dylib", "machType" : 6, "b" : "7FFF97521000", "buildId" : "A9E6B7D8C3273742AC5486C94218B1DF" }, { "path" : "/usr/lib/system/libunc.dylib", "machType" : 6, "b" : "7FFF91A0E000", "buildId" : "5676F7EAC1DF329FB006D2C3022B7D70" }, { "path" : "/usr/lib/system/libunwind.dylib", "machType" : 6, "b" : "7FFF8999F000", "buildId" : "BE7E51A0B6EA3A549CCA9D88F683A6D6" }, { "path" : "/usr/lib/system/libxpc.dylib", "machType" : 6, "b" : "7FFF952A7000", "buildId" : "9437C02EA07B38C891CB299FAA63083D" }, { "path" : "/usr/lib/libobjc.A.dylib", "machType" : 6, "b" : "7FFF93125000", "buildId" : "3B60CD9074A23A5D9686B0772159792A" }, { "path" : "/usr/lib/libauto.dylib", "machType" : 6, "b" : "7FFF97888000", "buildId" : "A260789BD4D8316A9490254767B8A5F1" }, { "path" : "/usr/lib/libc++abi.dylib", "machType" : 6, "b" : "7FFF8FE7A000", "buildId" : "88A22A0F87C63002BFBAAC0F2808B8B9" }, { "path" : "/usr/lib/libc++.1.dylib", "machType" : 6, "b" : "7FFF90122000", "buildId" : "1B9530FD989B3174BB1CBDC159501710" }, { "path" : "/usr/lib/libDiagnosticMessagesClient.dylib", "machType" : 6, "b" : "7FFF8AF96000", "buildId" : "2EE8E4365CDC34C599595BA218D507FB" } ] }}
 m31001|  mongod(_ZN5mongo15printStackTraceERSo+0x3A) [0x10799012a]
 m31001|  mongod(_ZN5mongo12_GLOBAL__N_124abruptQuitWithAddrSignalEiP9__siginfoPv+0x240) [0x10798fbb0]
 m31001|  libsystem_platform.dylib(_sigtramp+0x1A) [0x7fff8f8f1f1a]
 m31001|  mongod(_ZN8tcmalloc6Static14central_cache_E+0x2180) [0x108317350]
 m31001|  mongod(_ZN5mongo4repl28prefetchPagesForReplicatedOpEPNS_16OperationContextEPNS_8DatabaseERKNS_7BSONObjE+0x13F) [0x1074efbcf]
 m31001|  mongod(_ZN5mongo4repl8SyncTail10prefetchOpERKNS_7BSONObjE+0xC4) [0x107680494]
 m31001|  mongod(_ZN5mongo10threadpool6Worker4loopERKSs+0x92) [0x107922a12]
 m31001|  mongod(_ZN5boost12_GLOBAL__N_112thread_proxyEPv+0xB1) [0x1079d1571]
 m31001|  libsystem_pthread.dylib(_pthread_body+0x83) [0x7fff8981f2fc]
 m31001|  libsystem_pthread.dylib(_pthread_body+0x0) [0x7fff8981f279]
 m31001|  libsystem_pthread.dylib(thread_start+0xD) [0x7fff8981d4b1]
 m31001| -----  END BACKTRACE  -----



 Comments   
Comment by Githook User [ 14/Nov/14 ]

Author:

{u'username': u'dannenberg', u'name': u'matt dannenberg', u'email': u'matt.dannenberg@10gen.com'}

Message: SERVER-16117 do not prefetch a not yet created Database
Branch: master
https://github.com/mongodb/mongo/commit/d56a2b6bc566489899033275fd9dbb3ab5d4eb02

Comment by Spencer Brody (Inactive) [ 13/Nov/14 ]

Seems like implicit database creation is broken in 2.8 somehow. This is masked in a pure 2.8 replica set as now we when we create a collection implicitly via an insert we log an explicit createCollection entry in the primary's oplog, so the secondary replicates that before the insert. 2.6 primaries don't do that, so the insert is the first time the secondary has seen anything about this new database/collection.

Comment by Eric Milkie [ 13/Nov/14 ]

If you turn off prefetching, does that stop the crash? I would expect that the actual application of the op should implicitly create the database successfully with no crash.

Comment by Eric Milkie [ 13/Nov/14 ]

Why does the prefetch code crash instead of simply skipping the nonexistent op, or creating the database? Ideally it should happen there.

Comment by Matt Dannenberg [ 13/Nov/14 ]

If I create the database by inserting into another collection on the database prior to upgrading the node, the crash does not occur because the database exists.

2.8 creates an oplog entry for collection creation on an insert into a new collection, which will create the database.
2.6 does not. The collection and database creation is implicit.
I believe the solution is to have 2.8 notice the missing database (easy) and have it create it (not something I know how to do offhand, but probably similarly easy).

Comment by Matt Dannenberg [ 13/Nov/14 ]

the crash does not come from an invalid config or from the initial sync, but from the replication of the single write at the end of the file...
investigating further...

Generated at Thu Feb 08 03:40:01 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.