[SERVER-10201] DistributedLockPinger::shouldKill not thread safe (sync6.js failure) Created: 13/Jul/13  Updated: 11/Jul/16  Resolved: 24/Jul/13

Status: Closed
Project: Core Server
Component/s: Testing Infrastructure
Affects Version/s: None
Fix Version/s: 2.5.2

Type: Bug Priority: Major - P3
Reporter: Tad Marshall Assignee: Randolph Tan
Resolution: Done Votes: 0
Labels: buildbot
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Operating System: Windows
Participants:

 Description   

Windows 64-bit 2008R2+ DEBUG Build #194 July 13 (rev 458cfbf633d)

http://buildbot.mongodb.org/builders/Windows%2064-bit%202008R2%2B%20DEBUG/builds/194/steps/test_11/logs/stdio
http://buildlogs.mongodb.org/Windows%2064-bit%202008R2%2B%20DEBUG/builds/194/test/sharding/sync6.js

Ends with:

 m30004| Sat Jul 13 15:59:13.145 [thread-7] about to acquire distributed lock 'TimeSkewFailNewTest_lock_8/AMAZONA-VVETJQF:30004:1373745544:18467:thread-7:41:
 m30004| { "state" : 1,
 m30004|   "who" : "AMAZONA-VVETJQF:30004:1373745544:18467:thread-7:41",
 m30004|   "process" : "AMAZONA-VVETJQF:30004:1373745544:18467:thread-7:41",
 m30004|   "when" : { "$date" : "Wed Jul 17 14:06:52 2013" },
 m30004|   "why" : "Testing distributed lock with skew.",
 m30004|   "ts" : { "$oid" : "51e1b190f8b27d377bf2e2e6" } }
 m30004| { "_id" : "TimeSkewFailNewTest_lock_8",
 m30004|   "state" : 0,
 m30004|   "ts" : { "$oid" : "51e1b18af8b27d377bf2e2e2" } }
Sat Jul 13 15:59:13.161 Socket recv() errno:10054 An existing connection was forcibly closed by the remote host. 127.0.0.1:30004
Sat Jul 13 15:59:13.161 SocketException: remote: 127.0.0.1:30004 error: 9001 socket exception [RECV_ERROR] server [127.0.0.1:30004] 
Sat Jul 13 15:59:13.161 DBClientCursor::init call() failed
Sat Jul 13 15:59:13.192 Error: error doing query: failed at src/mongo/shell/query.js:81
failed to load: D:\slave\Windows_64bit_2008R2+_DEBUG\mongo\jstests\sharding\sync6.js

Selected lines from log:

m30004| Sat Jul 13 15:59:08.403 [LockPinger] *** C runtime error: C:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\INCLUDE\xtree(321) : Assertion failed: map/set iterators incompatible
// ... snip ...
m30004| Sat Jul 13 15:59:10.462 [LockPinger] mongod.exe  ...\src\mongo\util\stacktrace.cpp(169)                                        mongo::printStackTrace+0x5b
m30004| Sat Jul 13 15:59:10.462 [LockPinger] mongod.exe  ...\src\mongo\util\stacktrace.cpp(284)                                        mongo::crtDebugCallback+0x137
m30004| Sat Jul 13 15:59:10.462 [thread-3] created new distributed lock for TimeSkewFailNewTest_lock_8 on localhost:30000,localhost:30001,localhost:30002 ( lock timeout : 4000, ping interval : 133, process : 1 )
m30004| Sat Jul 13 15:59:10.462 [LockPinger] mongod.exe  f:\dd\vctools\crt_bld\self_64_amd64\crt\src\dbgrptt.c(605)                    _VCrtDbgReportW+0x812
m30004| Sat Jul 13 15:59:10.462 [LockPinger] mongod.exe  f:\dd\vctools\crt_bld\self_64_amd64\crt\src\dbgrpt.c(242)                     _CrtDbgReportWV+0x43
m30004| Sat Jul 13 15:59:10.462 [LockPinger] mongod.exe  f:\dd\vctools\crt_bld\self_64_amd64\crt\src\dbgrpt.c(258)                     _CrtDbgReportW+0x4d
m30004| Sat Jul 13 15:59:10.462 [LockPinger] mongod.exe  f:\dd\vctools\crt_bld\self_64_amd64\crt\src\stdthrow.cpp(13)                  std::_Debug_message+0x34
m30004| Sat Jul 13 15:59:10.462 [LockPinger] mongod.exe  c:\program files (x86)\microsoft visual studio 10.0\vc\include\xtree(322)     std::_Tree_const_iterator<std::_Tree_val<std::_Tset_traits<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,0> > >::operator==+0x72
m30004| Sat Jul 13 15:59:10.462 [LockPinger] mongod.exe  c:\program files (x86)\microsoft visual studio 10.0\vc\include\xtree(335)     std::_Tree_const_iterator<std::_Tree_val<std::_Tset_traits<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,0> > >::operator!=+0x32
m30004| Sat Jul 13 15:59:10.462 [LockPinger] mongod.exe  c:\program files (x86)\microsoft visual studio 10.0\vc\include\xutility(889)  std::_Distance2<std::_Tree_const_iterator<std::_Tree_val<std::_Tset_traits<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,0> > >,unsigned __int64>+0x51
m30004| Sat Jul 13 15:59:10.462 [LockPinger] mongod.exe  c:\program files (x86)\microsoft visual studio 10.0\vc\include\xutility(922)  std::_Distance<std::_Tree_const_iterator<std::_Tree_val<std::_Tset_traits<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,0> > >,unsigned __int64>+0xc6
m30004| Sat Jul 13 15:59:10.462 [LockPinger] mongod.exe  c:\program files (x86)\microsoft visual studio 10.0\vc\include\xtree(1445)    std::_Tree<std::_Tset_traits<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,0> >::count+0xda
m30004| Sat Jul 13 15:59:10.462 [LockPinger] mongod.exe  ...\src\mongo\client\distlock.cpp(304)                                        mongo::DistributedLockPinger::shouldKill+0x8c
m30004| Sat Jul 13 15:59:10.462 [thread-2] checking last ping for lock 'TimeSkewFailNewTest_lock_8/AMAZONA-VVETJQF:30004:1373745544:18467:thread-6:41' against process AMAZONA-VVETJQF:30004:1373745544:18467:thread-6:41 and ping Sun Jul 14 14:37:01 2013
m30004| Sat Jul 13 15:59:10.462 [LockPinger] mongod.exe  ...\src\mongo\client\distlock.cpp(104)                                        mongo::DistributedLockPinger::_distLockPingThread+0x342
m30004| Sat Jul 13 15:59:10.462 [LockPinger] mongod.exe  ...\src\mongo\client\distlock.cpp(236)                                        mongo::DistributedLockPinger::distLockPingThread+0xb0
m30004| Sat Jul 13 15:59:10.462 [LockPinger] mongod.exe  ...\src\third_party\boost\boost\bind\mem_fn_template.hpp(506)                 boost::_mfi::mf4<void,mongo::DistributedLockPinger,mongo::ConnectionString,__int64,std::basic_string<char,std::char_traits<char>,std::allocator<char> > const & __ptr64,unsigned __int64>::operator()+0xa0
m30004| Sat Jul 13 15:59:10.462 [LockPinger] mongod.exe  ...\src\third_party\boost\boost\bind\bind.hpp(526)                            boost::_bi::list5<boost::_bi::value<mongo::DistributedLockPinger * __ptr64>,boost::_bi::value<mongo::ConnectionString>,boost::_bi::value<__int64>,boost::_bi::value<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,boost::_bi::value<unsigned __int64> >::operator()<boost::_mfi::mf4<void,mongo::DistributedLockPinger,mongo::ConnectionString,__int64,std::basic_string<char,std::char_traits<char>,std::allocator<char> > const & __ptr64,unsigned __int64>,boost::_bi::list0>+0x173
m30004| Sat Jul 13 15:59:10.462 [LockPinger] mongod.exe  ...\src\third_party\boost\boost\bind\bind_template.hpp(21)                    boost::_bi::bind_t<void,boost::_mfi::mf4<void,mongo::DistributedLockPinger,mongo::ConnectionString,__int64,std::basic_string<char,std::char_traits<char>,std::allocator<char> > const & __ptr64,unsigned __int64>,boost::_bi::list5<boost::_bi::value<mongo::DistributedLockPinger * __ptr64>,boost::_bi::value<mongo::ConnectionString>,boost::_bi::value<__int64>,boost::_bi::value<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,boost::_bi::value<unsigned __int64> > >::operator()+0x61
m30004| Sat Jul 13 15:59:10.462 [LockPinger] mongod.exe  ...\src\third_party\boost\boost\thread\detail\thread.hpp(63)                  boost::detail::thread_data<boost::_bi::bind_t<void,boost::_mfi::mf4<void,mongo::DistributedLockPinger,mongo::ConnectionString,__int64,std::basic_string<char,std::char_traits<char>,std::allocator<char> > const & __ptr64,unsigned __int64>,boost::_bi::list5<boost::_bi::value<mongo::DistributedLockPinger * __ptr64>,boost::_bi::value<mongo::ConnectionString>,boost::_bi::value<__int64>,boost::_bi::value<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,boost::_bi::value<unsigned __int64> > > >::run+0x2f
m30004| Sat Jul 13 15:59:10.462 [LockPinger] mongod.exe  ...\src\third_party\boost\libs\thread\src\win32\thread.cpp(185)               boost::`anonymous namespace'::thread_start_function+0x4c

The DistributedLockPinger::shouldKill() routine does not lock the mutex before calling _kill.count(); this may explain how the set was seen to be corrupt.

This is the MSVC code that threw the error:
c:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\include\xtree lines 315 to 337

	bool operator==(const _Myiter& _Right) const
		{	// test for iterator equality
 #if _ITERATOR_DEBUG_LEVEL == 2
		if (this->_Getcont() == 0
			|| this->_Getcont() != _Right._Getcont())
			{	// report error
			_DEBUG_ERROR("map/set iterators incompatible");
			_SCL_SECURE_INVALID_ARGUMENT;
			}
 
 #elif _ITERATOR_DEBUG_LEVEL == 1
		_SCL_SECURE_VALIDATE(this->_Getcont() != 0
			&& this->_Getcont() == _Right._Getcont());
 #endif /* _ITERATOR_DEBUG_LEVEL */
 
		return (this->_Ptr == _Right._Ptr);
		}
 
	bool operator!=(const _Myiter& _Right) const
		{	// test for iterator inequality
		return (!(*this == _Right));
		}
	};



 Comments   
Comment by auto [ 24/Jul/13 ]

Author:

{u'username': u'renctan', u'name': u'Randolph Tan', u'email': u'randolph@10gen.com'}

Message: SERVER-10201 DistributedLockPinger::shouldKill not thread safe (sync6.js failure)
Branch: master
https://github.com/mongodb/mongo/commit/899c017dd024f487d0601ec7a9b5389b723adf9a

Comment by Randolph Tan [ 24/Jul/13 ]

Note: The variable DistributedLockPinger::shouldKill is accessing is not being modified anywhere other than tests.

Generated at Thu Feb 08 03:22:33 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.