[SERVER-19026] Potential double exception while establishing a cursor in mongos Created: 18/Jun/15  Updated: 07/Jan/16  Resolved: 29/Oct/15

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 3.0.4
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Randolph Tan Assignee: Randolph Tan
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File crash.js     Text File crash.patch    
Issue Links:
Depends
Related
Operating System: ALL
Sprint: Sharding 9 (09/18/15), Sharding A (10/09/15), Sharding B (10/30/15), Sharding C (11/20/15)
Participants:

 Description   

2015-06-18T05:58:31.579+0200 F -        [conn697] DBException::toString(): 16380 Failed to call say, no good nodes in node9
Actual exception type: mongo::UserException
 
 0xa57a29 0xa57520 0x7f4b02304bd6 0x7f4b02304c03 0x7f4b0230555f 0x622b82 0x623019 0x5b4dae 0x632914 0x5b4dae 0x624116 0x634989 0x627155 0x978532 0x961fca 0x5b6928 0xa0848b 0x7f4b02fbf9d1 0x7f4b01af88fd
----- BEGIN BACKTRACE -----
{"backtrace":[{"b":"400000","o":"657A29"},{"b":"400000","o":"657520"},{"b":"7F4B02248000","o":"BCBD6"},{"b":"7F4B02248000","o":"BCC03"},{"b":"7F4B02248000","o":"BD55F"},{"b":"400000","o":"222B82"},{"b":"400000","o":"223019"},{"b":"400000","o":"1B4DAE"},{"b":"400000","o":"232914"},{"b":"400000","o":"1B4DAE"},{"b":"400000","o":"224116"},{"b":"400000","o":"234989"},{"b":"400000","o":"227155"},{"b":"400000","o":"578532"},{"b":"400000","o":"561FCA"},{"b":"400000","o":"1B6928"},{"b":"400000","o":"60848B"},{"b":"7F4B02FB8000","o":"79D1"},{"b":"7F4B01A10000","o":"E88FD"}],"processInfo":{ "mongodbVersion" : "3.0.4", "gitVersion" : "0481c958daeb2969800511e7475dc66986fa9ed5", "uname" : { "sysname" : "Linux", "release" : "2.6.32-042stab106.4", "version" : "#1 SMP Fri Mar 27 15:19:28 MSK 2015", "machine" : "x86_64" }, "somap" : [ { "elfType" : 2, "b" : "400000", "buildId" : "8B2C75F8422F298334E7937FD0DBAF21A1F8179A" }, { "b" : "7FFF554F0000", "elfType" : 3, "buildId" : "40E853CD990256FEF45FCCA823F010EDC002660F" }, { "b" : "7F4B02FB8000", "path" : "/lib64/libpthread.so.0", "elfType" : 3, "buildId" : "B8DFF8E53D9F2B80C3C382E83EC17C828B536A39" }, { "b" : "7F4B02D48000", "path" : "/usr/lib64/libssl.so.10", "elfType" : 3, "buildId" : "DAF114120DA5C9DBEB1E5A704CE83ACB9B8B7B54" }, { "b" : "7F4B02960000", "path" : "/usr/lib64/libcrypto.so.10", "elfType" : 3, "buildId" : "F523EAC46D068A8E0869CF93BCD84B414937993A" }, { "b" : "7F4B02758000", "path" : "/lib64/librt.so.1", "elfType" : 3, "buildId" : "583411D8786F86A1D6B8741C502831E6122445A7" }, { "b" : "7F4B02550000", "path" : "/lib64/libdl.so.2", "elfType" : 3, "buildId" : "454F8FC6CC6502C6401E5F9E221564D80665D277" }, { "b" : "7F4B02248000", "path" : "/usr/lib64/libstdc++.so.6", "elfType" : 3, "buildId" : "F07F2E7CF4BFB393CC9BBE8CDC6463652E14DB07" }, { "b" : "7F4B01FC0000", "path" : "/lib64/libm.so.6", "elfType" : 3, "buildId" : "7D8E9374F4A4EA38A7C1E763F32240EA113E4208" }, { "b" : "7F4B01DA8000", "path" : "/lib64/libgcc_s.so.1", "elfType" : 3, "buildId" : "246C3BAB0AB093AFD59D34C8CBF29E786DE4BE97" }, { "b" : "7F4B01A10000", "path" : "/lib64/libc.so.6", "elfType" : 3, "buildId" : "E4EAB3C200B7D8444FF95AB01F6466924A6A5F5F" }, { "b" : "7F4B031D8000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3, "buildId" : "6F8E59B70E469F3A924A268911FF8FD0C37E7460" }, { "b" : "7F4B017C8000", "path" : "/lib64/libgssapi_krb5.so.2", "elfType" : 3, "buildId" : "B7F7FF323B3A4A12310A6285412F01ACE8C74E47" }, { "b" : "7F4B014E0000", "path" : "/lib64/libkrb5.so.3", "elfType" : 3, "buildId" : "7920917F74AFAD0B8CB197CABBE472AF39D94C34" }, { "b" : "7F4B012D8000", "path" : "/lib64/libcom_err.so.2", "elfType" : 3, "buildId" : "8CE28F280150E62296240E70ECAC64E4A57AB826" }, { "b" : "7F4B010A8000", "path" : "/lib64/libk5crypto.so.3", "elfType" : 3, "buildId" : "05733977F4E41652B86070B27A0CFC2C1EA7719D" }, { "b" : "7F4B00E90000", "path" : "/lib64/libz.so.1", "elfType" : 3, "buildId" : "5FA8E5038EC04A774AF72A9BB62DC86E1049C4D6" }, { "b" : "7F4B00C80000", "path" : "/lib64/libkrb5support.so.0", "elfType" : 3, "buildId" : "C8D01C2839F6950988CE32B4266A8F89C521ACB0" }, { "b" : "7F4B00A78000", "path" : "/lib64/libkeyutils.so.1", "elfType" : 3, "buildId" : "AF374BAFB7F5B139A0B431D3F06D82014AFF3251" }, { "b" : "7F4B00858000", "path" : "/lib64/libresolv.so.2", "elfType" : 3, "buildId" : "F8B68F301C19BF06AF56B4B06E0A69F89D2C1F8D" }, { "b" : "7F4B00638000", "path" : "/lib64/libselinux.so.1", "elfType" : 3, "buildId" : "E6798A06BEE17CF102BBA44FD512FF8B805CEAF1" } ] }}
 mongos(_ZN5mongo15printStackTraceERSo+0x29) [0xa57a29]
 mongos(+0x657520) [0xa57520]
 libstdc++.so.6(+0xBCBD6) [0x7f4b02304bd6]
 libstdc++.so.6(+0xBCC03) [0x7f4b02304c03]
 libstdc++.so.6(+0xBD55F) [0x7f4b0230555f]
 mongos(_ZN5mongo14DBClientCursorD1Ev+0x402) [0x622b82]
 mongos(_ZN5mongo14DBClientCursorD0Ev+0x9) [0x623019]
 mongos(_ZN5boost6detail15sp_counted_base7releaseEv+0x1E) [0x5b4dae]
 mongos(_ZN5boost6detail17sp_counted_impl_pIN5mongo23ParallelConnectionStateEE7disposeEv+0x34) [0x632914]
 mongos(_ZN5boost6detail15sp_counted_base7releaseEv+0x1E) [0x5b4dae]
 mongos(_ZN5mongo26ParallelConnectionMetadata7cleanupEb+0xB6) [0x624116]
 mongos(_ZNSt8_Rb_treeIN5mongo5ShardESt4pairIKS1_NS0_26ParallelConnectionMetadataEESt10_Select1stIS5_ESt4lessIS1_ESaIS5_EE8_M_eraseEPSt13_Rb_tree_nodeIS5_E+0x49) [0x634989]
 mongos(_ZN5mongo27ParallelSortClusteredCursorD1Ev+0xF5) [0x627155]
 mongos(_ZN5mongo8Strategy7queryOpERNS_7RequestE+0x1952) [0x978532]
 mongos(_ZN5mongo7Request7processEi+0x52A) [0x961fca]
 mongos(_ZN5mongo21ShardedMessageHandler7processERNS_7MessageEPNS_21AbstractMessagingPortEPNS_9LastErrorE+0x58) [0x5b6928]
 mongos(_ZN5mongo17PortMessageServer17handleIncomingMsgEPv+0x34B) [0xa0848b]
 libpthread.so.0(+0x79D1) [0x7f4b02fbf9d1]
 libc.so.6(clone+0x6D) [0x7f4b01af88fd]
-----  END BACKTRACE  -----



 Comments   
Comment by Randolph Tan [ 29/Oct/15 ]

Current master uses cluster_find for queries and does not use DBClientCursor anymore.

Comment by Randolph Tan [ 28/Oct/15 ]

Attached crash.patch (should apply cleanly on r3.0.4 tag) and crash.js. The problem lies in the fact that conn that is attached to the cursor is not guaranteed to live during the entire lifetime of the cursor. The crash patch forces this condition by destroying the connection on ParallelConnectionMetadata::cleanup instead of just simply returning it to the pool (this simulates the behavior when the pool is at max capacity).

I believe this particular code path will never be exercised in the current master as the ParallelSortedClusterCursor usage is only limited to the cases where it would not generate any cursor.

Comment by Andy Schwerin [ 08/Jul/15 ]

Repro script?

Generated at Thu Feb 08 03:49:35 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.