|
We have a replica set with two nodes and one arbiter.
{
"_id" : "rsstuff",
"version" : 6,
"members" : [
{
"_id" : 0,
"host" : "LOG-MNG13:27017",
"priority" : 10
}
,
{
"_id" : 1,
"host" : "LOG-MNG23:27017"
}
,
{
"_id" : 2,
"host" : "LOG-MNG21:27018",
"arbiterOnly" : true
}
]
}
Last night, the replica have started to fail. I have reviewing the log (I'm going to attach it later), I could see the following:
1) Communication error (tiemout) from secondary to primary
Thu Jun 27 02:03:41.524 [initandlisten] connection accepted from 192.168.3.23:61211 #1879438 (200 connections now open)
Thu Jun 27 02:03:41.883 [rsSyncNotifier] Socket recv() timeout 172.29.106.76:27017
Thu Jun 27 02:03:41.883 [rsSyncNotifier] SocketException: remote: 172.29.106.76:27017 error: 9001 socket exception [3] server [172.29.106.76:27017]
Thu Jun 27 02:03:41.883 [rsSyncNotifier] replset tracking exception: exception: 10278 dbclient error communicating with server: log-mng12:27017
2) Try to reconnect
Thu Jun 27 02:04:00.680 [rsSyncNotifier] replset setting oplog notifier to log-mng12:27017
...
Thu Jun 27 02:04:04.939 [rsBackgroundSync] replSet syncing to: log-mng12:27017
3) Then there was an error about size ("Size must be between 0 and 16793600(16MB)"). The aggregate.tmp.mr.xxx are temporary collection that they are created during mapreduce operations.
Thu Jun 27 02:04:05.126 [repl writer worker 2] build index aggregate.tmp.mr.Statistics_HotelsMeansEntryAvailability_52339_inc
{ _id: 1 }
Thu Jun 27 02:04:05.126 [repl writer worker 2] build index done. scanned 0 total records. 0.001 secs
Thu Jun 27 02:04:05.126 [repl writer worker 2] info: creating collection aggregate.tmp.mr.Statistics_HotelsMeansEntryAvailability_52339_inc on add index
Thu Jun 27 02:04:05.126 [repl writer worker 2] build index aggregate.tmp.mr.Statistics_HotelsMeansEntryAvailability_52339_inc
{ 0: 1 }
Thu Jun 27 02:04:05.126 [repl writer worker 2] build index done. scanned 0 total records. 0 secs
Thu Jun 27 02:04:05.126 [repl writer worker 1] build index aggregate.tmp.mr.Statistics_HotelsMeansEntryAvailability_52339
{ _id: 1 }
Thu Jun 27 02:04:05.126 [repl writer worker 1] build index done. scanned 0 total records. 0.001 secs
Thu Jun 27 02:04:05.126 [repl writer worker 1] info: indexing in foreground on this replica; was a background index build on the primary
Thu Jun 27 02:04:05.126 [repl writer worker 1] build index aggregate.tmp.mr.Statistics_HotelsMeansEntryAvailability_52339
{ _id.f: 1.0, _id.dc: 1.0, _id.dz: 1.0, _id.dh: 1.0, _id.arr: 1.0, _id.nights: 1.0, _id.na: 1.0, _id.nc: 1.0 }
Thu Jun 27 02:04:05.142 [repl writer worker 1] build index done. scanned 0 total records. 0 secs
Thu Jun 27 02:04:05.142 [repl writer worker 1] build index aggregate.tmp.mr.Statistics_HotelsBookingEngine_52338_inc
{ _id: 1 }
Thu Jun 27 02:04:05.142 [rsBackgroundSync] Assertion: 10334:BSONObj size: 1845624932 (0x6400026E) is invalid. Size must be between 0 and 16793600(16MB) First element: : ?type=115
Thu Jun 27 02:04:05.142 [repl writer worker 1] build index done. scanned 0 total records. 0.001 secs
Thu Jun 27 02:04:05.142 [repl writer worker 1] info: creating collection aggregate.tmp.mr.Statistics_HotelsBookingEngine_52338_inc on add index
Thu Jun 27 02:04:05.142 [repl writer worker 1] build index aggregate.tmp.mr.Statistics_HotelsBookingEngine_52338_inc
{ 0: 1 }
Thu Jun 27 02:04:05.142 [repl writer worker 1] build index done. scanned 0 total records. 0 secs
Thu Jun 27 02:04:05.158 [repl writer worker 7] build index aggregate.tmp.mr.Statistics_HotelsBookingEngine_52338
{ _id: 1 }
4) Then the log shows a dump
Thu Jun 27 02:04:05.672 [rsBackgroundSync] mongod.exe ...\src\mongo\util\stacktrace.cpp(189) mongo::printStackTrace+0x3e
Thu Jun 27 02:04:05.672 [rsBackgroundSync] mongod.exe ...\src\mongo\util\assert_util.cpp(159) mongo::msgasserted+0xc1
Thu Jun 27 02:04:05.672 [rsBackgroundSync] mongod.exe ...\src\mongo\bson\bson-inl.h(219) mongo::BSONObj::_assertInvalid+0x3e6
Thu Jun 27 02:04:05.672 [rsBackgroundSync] mongod.exe ...\src\mongo\client\dbclientcursor.cpp(238) mongo::DBClientCursor::next+0x153
Thu Jun 27 02:04:05.672 [rsBackgroundSync] mongod.exe ...\src\mongo\client\dbclientcursor.h(78) mongo::DBClientCursor::nextSafe+0x43
Thu Jun 27 02:04:05.672 [rsBackgroundSync] mongod.exe ...\src\mongo\db\oplogreader.h(127) mongo::OplogReader::nextSafe+0x44
Thu Jun 27 02:04:05.672 [rsBackgroundSync] mongod.exe ...\src\mongo\db\repl\bgsync.cpp(351) mongo::replset::BackgroundSync::produce+0x457
Thu Jun 27 02:04:05.672 [rsBackgroundSync] mongod.exe ...\src\mongo\db\repl\bgsync.cpp(241) mongo::replset::BackgroundSync::producerThread+0x98
Thu Jun 27 02:04:05.672 [rsBackgroundSync] mongod.exe ...\src\third_party\boost\libs\thread\src\win32\thread.cpp(180) boost::`anonymous namespace'::thread_start_function+0x21
Thu Jun 27 02:04:05.672 [rsBackgroundSync] mongod.exe f:\dd\vctools\crt_bld\self_64_amd64\crt\src\threadex.c(314) _callthreadstartex+0x17
Thu Jun 27 02:04:05.672 [rsBackgroundSync] mongod.exe f:\dd\vctools\crt_bld\self_64_amd64\crt\src\threadex.c(292) _threadstartex+0x7f
Thu Jun 27 02:04:05.672 [rsBackgroundSync] KERNEL32.DLL BaseThreadInitThunk+0x1a
Thu Jun 27 02:04:05.672 [rsBackgroundSync] replSet db exception in producer: 10334 BSONObj size: 1845624932 (0x6400026E) is invalid. Size must be between 0 and 16793600(16MB) First element: : ?type=115
5) Then, I could see "BSONElement: bad type -53"
Thu Jun 27 02:04:09.931 [initandlisten] connection accepted from 172.29.106.46:51733 #1879639 (201 connections now open)
Thu Jun 27 02:04:10.150 [rsSync] Assertion: 10320:BSONElement: bad type -53
Thu Jun 27 02:04:10.259 [conn1879163] end connection 172.29.106.52:16844 (200 connections now open)
Thu Jun 27 02:04:10.259 [initandlisten] connection accepted from 172.29.106.52:19619 #1879640 (201 connections now open)
Thu Jun 27 02:04:10.337 [initandlisten] connection accepted from 172.29.106.51:5754 #1879641 (202 connections now open)
Thu Jun 27 02:04:10.337 [conn1879641] end connection 172.29.106.51:5754 (201 connections now open)
Thu Jun 27 02:04:10.462 [initandlisten] connection accepted from 172.29.106.51:5756 #1879642 (202 connections now open)
Thu Jun 27 02:04:10.462 [conn1879642] end connection 172.29.106.51:5756 (201 connections now open)
Thu Jun 27 02:04:10.680 [rsSync] mongod.exe ...\src\mongo\util\stacktrace.cpp(189) mongo::printStackTrace+0x3e
Thu Jun 27 02:04:10.680 [rsSync] mongod.exe ...\src\mongo\util\assert_util.cpp(159) mongo::msgasserted+0xc1
Thu Jun 27 02:04:10.680 [rsSync] mongod.exe ...\src\mongo\bson\bson-inl.h(673) mongo::BSONElement::size+0x1b5
Thu Jun 27 02:04:10.680 [rsSync] mongod.exe ...\src\mongo\db\repl\rs_sync.cpp(484) mongo::replset::SyncTail::tryPopAndWaitForMore+0x161
Thu Jun 27 02:04:10.680 [rsSync] mongod.exe ...\src\mongo\db\repl\rs_sync.cpp(380) mongo::replset::SyncTail::oplogApplication+0x449
Thu Jun 27 02:04:10.680 [rsSync] mongod.exe ...\src\mongo\db\repl\rs_sync.cpp(723) mongo::ReplSetImpl::_syncThread+0x9c
Thu Jun 27 02:04:10.680 [rsSync] mongod.exe ...\src\mongo\db\repl\rs_sync.cpp(743) mongo::ReplSetImpl::syncThread+0xdc
Thu Jun 27 02:04:10.680 [rsSync] mongod.exe ...\src\mongo\db\repl\rs_sync.cpp(769) mongo::startSyncThread+0x8a
Thu Jun 27 02:04:10.680 [rsSync] mongod.exe ...\src\third_party\boost\libs\thread\src\win32\thread.cpp(180) boost::`anonymous namespace'::thread_start_function+0x21
Thu Jun 27 02:04:10.680 [rsSync] mongod.exe f:\dd\vctools\crt_bld\self_64_amd64\crt\src\threadex.c(314) _callthreadstartex+0x17
Thu Jun 27 02:04:10.680 [rsSync] mongod.exe f:\dd\vctools\crt_bld\self_64_amd64\crt\src\threadex.c(292) _threadstartex+0x7f
Thu Jun 27 02:04:10.680 [rsSync] KERNEL32.DLL BaseThreadInitThunk+0x1a
Thu Jun 27 02:04:10.680 [rsSync] replSet syncThread: 10320 BSONElement: bad type -53
|