[SERVER-22335] Do not prepare getmore when un-needed in bgsync fetcher Created: 28/Jan/16  Updated: 19/Nov/16  Resolved: 29/Jan/16

Status: Closed
Project: Core Server
Component/s: Replication, Sharding
Affects Version/s: None
Fix Version/s: 3.2.3, 3.3.2

Type: Bug Priority: Critical - P2
Reporter: Timothy Olsen (Inactive) Assignee: Scott Hernandez (Inactive)
Resolution: Done Votes: 0
Labels: code-only, csrsupgrade
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: HTML File run9007     HTML File run9008     HTML File run9009    
Issue Links:
Depends
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Completed:
Sprint: Repl F (01/29/16)
Participants:

 Description   

During upgrade from WiredTiger SCCC config servers to CSRS, I get a backtrace on the new CSRS second and third config servers. This happens about 4 seconds after reenabling the balancer. I am using MongoDB version 3.2.1-95-g4a3c6e6

Attached are the logs for the 3 config servers. run9007 is the first config server, run9008 the second, and run9009 the third. The balancer was restored around the time 11:19:01. The backtraces for run9008 and run9009 occur around the time 11:19:05.



 Comments   
Comment by Githook User [ 29/Jan/16 ]

Author:

{u'username': u'scotthernandez', u'name': u'Scott Hernandez', u'email': u'scotthernandez@gmail.com'}

Message: SERVER-22335: only prepare getmore when needed in bgsync fetcher
Branch: master
https://github.com/mongodb/mongo/commit/1175510702d070e4eeed436d81c52dec855f664e

Comment by Githook User [ 29/Jan/16 ]

Author:

{u'username': u'scotthernandez', u'name': u'Scott Hernandez', u'email': u'scotthernandez@gmail.com'}

Message: SERVER-22335: only prepare getmore when needed in bgsync fetcher

(cherry picked from commit 1175510702d070e4eeed436d81c52dec855f664e)
Branch: v3.2
https://github.com/mongodb/mongo/commit/8f4c9f4f690407f5676da8c7089a7512a1d4d014

Comment by Scott Hernandez (Inactive) [ 29/Jan/16 ]

I believe the failure is that the getmore returns with a cursorId of 0, indicating there is no more, and not sending a BSONObjectBuilder pointer, which the invariant hits. This was not a case the code was built to support so we need to test for this case and allow it – also, invariant is wrong here I believe either way since the code can recovery by issuing a new query/fetcher.

Comment by Scott Hernandez (Inactive) [ 28/Jan/16 ]

Out until Monday, pass back if

Comment by Spencer Brody (Inactive) [ 28/Jan/16 ]

Looks like a failure in bgsync, Scott can you take a look?

Comment by Spencer Brody (Inactive) [ 28/Jan/16 ]

2016-01-28T11:19:04.914-0500 I REPL     [ReplicationExecutor] not becoming primary, we received insufficient votes
2016-01-28T11:19:05.605-0500 I REPL     [ReplicationExecutor] syncing from: cfg-9007-alias.lvh.me:9007
2016-01-28T11:19:05.606-0500 I REPL     [SyncSourceFeedback] setting syncSourceFeedback to cfg-9007-alias.lvh.me:9007
2016-01-28T11:19:05.609-0500 I ASIO     [NetworkInterfaceASIO-BGSync-0] Successfully connected to cfg-9007-alias.lvh.me:9007
2016-01-28T11:19:05.611-0500 I -        [rsBackgroundSync-0] Invariant failure bob src/mongo/db/repl/bgsync.cpp 639
2016-01-28T11:19:05.611-0500 I -        [rsBackgroundSync-0] 
 
***aborting after invariant() failure
 
 
2016-01-28T11:19:05.623-0500 F -        [rsBackgroundSync-0] warning: log line attempted (10k) over max size (10k), printing beginning and end ... Got signal: 6 (Abort trap: 6).
 
 0x105f7b089 0x105f7aa10 0x7fff9bf72eaa 0x7fff88dcfa36 0x7fff8c2f36e7 0x105f1fed9 0x105b46625 0x105b48ac5 0x105720ca9 0x105d982fa 0x105d9715d 0x105d97493 0x105f29d6c 0x105f2a99b 0x105f2a54d 0x105f2b4c8 0x7fff9dc33c13 0x7fff9dc33b90 0x7fff9dc31375
----- BEGIN BACKTRACE -----
{"backtrace":[{"b":"10569D000","o":"8DE089"},{"b":"10569D000","o":"8DDA10"},{"b":"7FFF9BF6E000","o":"4EAA"},{"b":"7FFF88DCD000","o":"2A36"},{"b":"7FFF8C295000","o":"5E6E7"},{"b":"10569D000","o":"882ED9"},{"b":"10569D000","o":"4A9625"},{"b":"10569D000","o":"4ABAC5"},{"b":"10569D000","o":"83CA9"},{"b":"10569D000","o":"6FB2FA"},{"b":"10569D000","o":"6FA15D"},{"b":"10569D000","o":"6FA493"},{"b":"10569D000","o":"88CD6C"},{"b":"10569D000","o":"88D99B"},{"b":"10569D000","o":"88D54D"},{"b":"10569D000","o":"88E4C8"},{"b":"7FFF9DC30000","o":"3C13"},{"b":"7FFF9DC30000","o":"3B90"},{"b":"7FFF9DC30000","o":"1375"}],"processInfo":{ "mongodbVersion" : "3.2.1-95-g4a3c6e6", "gitVersion" : "4a3c6e62882269432e8df8c19675bde716f38d50", "compiledModules" : [], "uname" : { "sysname" : "Darwin", "release" : "15.3.0", "version" : "Darwin Kernel Version 15.3.0: Thu Dec 10 18:40:58 PST 2015; root:xnu-3248.30.4~1/RELEASE_X86_64", "machine" : "x86_64" }, "somap" : [ { "path" : "/tmp/mms-automation/test/versions/mongodb-osx-x86_64-3.2.1-95-g4a3c6e6/bin/mongod", "machType" : 2, "b" : "10569D000", "vmaddr" : "100000000", "buildId" : "1F7B2B9179413D908968E4C4D5BEDDCF" }, { "path" : "/usr/lib/libSystem.B.dylib", "machType" : 6, "b" : "7FFF9BDE0000", "vmaddr" : "7FFF93B5F000", "buildId" : "5A4257EF31453BB387A40D2404A9462D" }, { "path" : "/usr/lib/libc++.1.dylib", "machType" : 6, "b" : "7FFF9AB72000", "vmaddr" : "7FFF928F1000", "buildId" : "8FC3D139805534989AC56467CB7F4D14" }, { "path" : "/usr/lib/system/libcache.dylib", "machType" : 6, "b" : "7FFF8A07E000", "vmaddr" : "7FFF81DFD000", "buildId" : "6B245C0AF3EA383BA5425B0D0456A41B" }, { "path" : "/usr/lib/system/libcommonCrypto.dylib", "machType" : 6, "b" : "7FFF971FA000", "vmaddr" : "7FFF8EF79000", "buildId" : "766BC3F541F33315BABC72718A98EA92" }, { "path" : "/usr/lib/system/libcompiler_rt.dylib", "machType" : 6, "b" : "7FFF958D3000", "vmaddr" : "7FFF8D652000", "buildId" : "D3C4AB4023B43BC68C385B8758D14E80" }, { "path" : "/usr/lib/system/libcopyfile.dylib", "machType" : 6, "b" : "7FFF915F9000", "vmaddr" : "7FFF89378000", "buildId" : "F51332690B22388CA57C079667B6291E" }, { "path" : "/usr/lib/system/libcorecrypto.dylib", "machType" : 6, "b" : "7FFF97739000", "vmaddr" : "7FFF8F4B8000", "buildId" : "C6BD205F4ECE37EEBCABA76F39CDCFFA" }, { "path" : "/usr/lib/system/libdispatch.dylib", "machType" : 6, "b" : "7FFF973FC000", "vmaddr" : "7FFF8F17B000", "buildId" : "324C91892AF33356847F6F4CE1C6E901" }, { "path" : "/usr/lib/system/libdyld.dylib", "machType" : 6, "b" : "7FFF8EB35000", "vmaddr" : "7FFF868B4000", "buildId" : "AA629043C6F632FE8007E3478E99ACA7" }, { "path" : "/usr/lib/system/libkeymgr.dylib", "machType" : 6, "b" : "7FFF8E646000", "vmaddr" : "7FFF863C5000", "buildId" : "09397E0160663179A50C2CE666FDA929" }, { "path" : "/usr/lib/system/liblaunch.dylib", "machType" : 6, "b" : "7FFF8D80C000", "vmaddr" : "7FFF8558B000", "buildId" : "EDF719D6D2BB38DD8C944272BEFDA2CD" }, { "path" : "/usr/lib/system/libmacho.dylib", "machType" : 6, "b" : "7FFF96171000", "vmaddr" : "7FFF8DEF0000", "buildId" : "CB745E1F48853F96B38B2093DF488FD5" }, { "path" : "/us .......... em/libxpc.dylib", "machType" : 6, "b" : "7FFF9B538000", "vmaddr" : "7FFF932B7000", "buildId" : "61AB46109304354C9E9BD57198AE9866" }, { "path" : "/usr/lib/libobjc.A.dylib", "machType" : 6, "b" : "7FFF9A6A7000", "vmaddr" : "7FFF92426000", "buildId" : "9F45830DF1D53CDF94611A5477ED7D1E" }, { "path" : "/usr/lib/libauto.dylib", "machType" : 6, "b" : "7FFF9D57F000", "vmaddr" : "7FFF952FE000", "buildId" : "999E610F41FC32A3ADCA5EC049B65DFB" }, { "path" : "/usr/lib/libc++abi.dylib", "machType" : 6, "b" : "7FFF8A6FF000", "vmaddr" : "7FFF8247E000", "buildId" : "DCCC81773D0935BC97842A04FEC4C71B" }, { "path" : "/usr/lib/libDiagnosticMessagesClient.dylib", "machType" : 6, "b" : "7FFF96D06000", "vmaddr" : "7FFF8EA85000", "buildId" : "4243B6B421E9355B9C5A95A216233B96" } ] }}
 mongod(_ZN5mongo15printStackTraceERNSt3__113basic_ostreamIcNS0_11char_traitsIcEEEE+0x39) [0x105f7b089]
 mongod(_ZN5mongo12_GLOBAL__N_110abruptQuitEi+0x90) [0x105f7aa10]
 libsystem_platform.dylib(_sigtramp+0x1A) [0x7fff9bf72eaa]
 libsystem_malloc.dylib(szone_malloc_should_clear+0x445) [0x7fff88dcfa36]
 libsystem_c.dylib(abort+0x81) [0x7fff8c2f36e7]
 mongod(_ZN5mongo15invariantFailedEPKcS1_j+0x2E9) [0x105f1fed9]
 mongod(_ZN5mongo4repl14BackgroundSync16_fetcherCallbackERKNS_10StatusWithINS_7Fetcher13QueryResponseEEEPNS_14BSONObjBuilderERKNS_11HostAndPortENS0_6OpTimeExNSt3__16chrono8durationIxNSE_5ratioILl1ELl1000EEEEEPNS_6StatusE+0x1E75) [0x105b46625]
 mongod(_ZNSt3__110__function6__funcINS_6__bindIMN5mongo4repl14BackgroundSyncEFvRKNS3_10StatusWithINS3_7Fetcher13QueryResponseEEEPNS3_14BSONObjBuilderERKNS3_11HostAndPortENS4_6OpTimeExNS_6chrono8durationIxNS_5ratioILl1ELl1000EEEEEPNS3_6StatusEEJPS5_RNS_12placeholders4__phILi1EEERNST_ILi3EEENS_17reference_wrapperISF_EERSH_RxRKSM_SO_EEENS_9allocatorIS14_EEFvSB_PNS7_10NextActionESD_EEclESB_OS18_OSD_+0x55) [0x105b48ac5]
 mongod(_ZN5mongo7Fetcher9_callbackERKNS_8executor12TaskExecutor25RemoteCommandCallbackArgsEPKc+0x2779) [0x105720ca9]
 mongod(_ZNSt3__110__function6__funcIZZN5mongo8executor22ThreadPoolTaskExecutor21scheduleRemoteCommandERKNS3_20RemoteCommandRequestERKNS_8functionIFvRKNS3_12TaskExecutor25RemoteCommandCallbackArgsEEEEENK3$_2clERKNS2_10StatusWithINS3_21RemoteCommandResponseEEEEUlRKNS9_12CallbackArgsEE_NS_9allocatorISQ_EEFvSP_EEclESP_+0x15A) [0x105d982fa]
 mongod(_ZN5mongo8executor22ThreadPoolTaskExecutor11runCallbackENSt3__110shared_ptrINS1_13CallbackStateEEE+0x13D) [0x105d9715d]
 mongod(_ZNSt3__110__function6__funcIZN5mongo8executor22ThreadPoolTaskExecutor23scheduleIntoPool_inlockEPNS_4listINS_10shared_ptrINS4_13CallbackStateEEENS_9allocatorIS8_EEEERKNS_15__list_iteratorIS8_PvEESH_NS_11unique_lockINS_5mutexEEEE3$_4NS9_ISL_EEFvvEEclEv+0x33) [0x105d97493]
 mongod(_ZN5mongo10ThreadPool10_doOneTaskEPNSt3__111unique_lockINS1_5mutexEEE+0x25C) [0x105f29d6c]
 mongod(_ZN5mongo10ThreadPool13_consumeTasksEv+0x1FB) [0x105f2a99b]
 mongod(_ZN5mongo10ThreadPool17_workerThreadBodyEPS0_RKNSt3__112basic_stringIcNS2_11char_traitsIcEENS2_9allocatorIcEEEE+0x10D) [0x105f2a54d]
 mongod(_ZNSt3__114__thread_proxyINS_5tupleIJNS_6__bindIPFvPN5mongo10ThreadPoolERKNS_12basic_stringIcNS_11char_traitsIcEENS_9allocatorIcEEEEEJS5_SD_EEEEEEEEPvSI_+0x68) [0x105f2b4c8]
 libsystem_pthread.dylib(_pthread_body+0x83) [0x7fff9dc33c13]
 libsystem_pthread.dylib(_pthread_body+0x0) [0x7fff9dc33b90]
 libsystem_pthread.dylib(thread_start+0xD) [0x7fff9dc31375]
-----  END BACKTRACE  -----

Comment by Ramon Fernandez Marina [ 28/Jan/16 ]

From the steps to reproduce:

This does not happen every time. But the time that it did occur, I followed the steps described in the CSRS Beta documentation. The backtrace occurred 4 seconds after reenabling the balancer.

Generated at Thu Feb 08 04:00:06 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.