[SERVER-17903] When corruption detected, server continues to run and sync secondaries Created: 07/Apr/15  Updated: 26/May/15  Resolved: 26/May/15

Status: Closed
Project: Core Server
Component/s: Stability, Storage
Affects Version/s: 2.6.4
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Ger Hartnett Assignee: Daniel Pasette (Inactive)
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
duplicates SERVER-12061 Do not silently ignore read errors wh... Closed
Related
is related to SERVER-12061 Do not silently ignore read errors wh... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Participants:

 Description   

When the server encountered the following corruption (a number of times), it continued to run. Later, secondaries synced from this server, a number of failovers happened & the replica set ended up in an inconsistent state where the primary contained less documents than one of the secondaries.

2015-02-03T15:00:43.054+0000 [conn427] mongod.exe    ...\src\mongo\util\stacktrace.cpp(169)                           mongo::printStackTrace+0x43
2015-02-03T15:00:43.054+0000 [conn427] mongod.exe    ...\src\mongo\util\log.cpp(127)                                  mongo::logContext+0x97
2015-02-03T15:00:43.054+0000 [conn427] mongod.exe    ...\src\mongo\util\assert_util.cpp(183)                          mongo::msgasserted+0xf7
2015-02-03T15:00:43.054+0000 [conn427] mongod.exe    ...\src\mongo\util\assert_util.cpp(174)                          mongo::msgasserted+0x13
2015-02-03T15:00:43.054+0000 [conn427] mongod.exe    ...\src\mongo\bson\bson-inl.h(219)                               mongo::BSONObj::_assertInvalid+0x46b
2015-02-03T15:00:43.054+0000 [conn427] mongod.exe    ...\src\mongo\db\exec\fetch.cpp(111)                             mongo::FetchStage::work+0x1a2
2015-02-03T15:00:43.054+0000 [conn427] mongod.exe    ...\src\mongo\db\query\plan_executor.cpp(91)                     mongo::PlanExecutor::getNext+0x15f
2015-02-03T15:00:43.054+0000 [conn427] mongod.exe    ...\src\mongo\db\query\cached_plan_runner.cpp(71)                mongo::CachedPlanRunner::getNext+0x53
2015-02-03T15:00:43.054+0000 [conn427] mongod.exe    ...\src\mongo\db\query\new_find.cpp(561)                         mongo::newRunQuery+0xb80
2015-02-03T15:00:43.054+0000 [conn427] mongod.exe    ...\src\mongo\db\instance.cpp(269)                               mongo::receivedQuery+0x406
2015-02-03T15:00:43.054+0000 [conn427] mongod.exe    ...\src\mongo\db\instance.cpp(437)                               mongo::assembleResponse+0x2f9
2015-02-03T15:00:43.054+0000 [conn427] mongod.exe    ...\src\mongo\db\db.cpp(202)                                     mongo::MyMessageHandler::process+0x10c
2015-02-03T15:00:43.054+0000 [conn427] mongod.exe    ...\src\mongo\util\net\message_server_port.cpp(210)              mongo::PortMessageServer::handleIncomingMsg+0x67f
2015-02-03T15:00:43.054+0000 [conn427] mongod.exe    ...\src\third_party\boost\libs\thread\src\win32\thread.cpp(185)  boost::`anonymous namespace'::thread_start_function+0x21
2015-02-03T15:00:43.054+0000 [conn427] MSVCR100.dll                                                                   endthreadex+0x43
2015-02-03T15:00:43.054+0000 [conn427] MSVCR100.dll                                                                   endthreadex+0xdf
2015-02-03T15:00:43.054+0000 [conn427] kernel32.dll                                                                   BaseThreadInitThunk+0xd
2015-02-03T15:00:43.054+0000 [conn427] DDD.CCC*
2015-02-03T15:00:43.226+0000 [conn427] assertion 10334 BSONObj size: 0 (0x0) is invalid. Size must be between 0 and 16793600(16MB) First element: EOO ns:DDD.CCC* query:{ $query: { BucketId: "default", StreamId: "143655635", StreamRevisionTo: { $gte: 0 }, StreamRevisionFrom: { $lte: 1 } }, $orderby: { StreamRevisionFrom: 1 } }
2015-02-03T15:00:43.226+0000 [conn437] serverStatus was very slow: { after basic: 0, after asserts: 0, after backgroundFlushing: 0, after connections: 0, after cursors: 0, after dur: 0, after extra_info: 0, after globalLock: 0, after indexCounters: 0, after locks: 0, after network: 0, after opcounters: 0, after opcountersRepl: 0, after recordStats: 23848, after repl: 23848, at end: 23848 }
2015-02-03T15:00:43.398+0000 [conn504] Assertion: 10334:BSONObj size: 0 (0x0) is invalid. Size must be between 0 and 16793600(16MB) First element: EOO
2015-02-03T15:00:43.460+0000 [rsHealthPoll] warning: Failed to connect to NNN.NNN.NNN.NNN*:27017, reason: errno:10061 No connection could be made because the target machine actively refused it.



 Comments   
Comment by Geert Bosch [ 22/May/15 ]

WiredTiger provides checksum validation.

Comment by Daniel Pasette (Inactive) [ 15/Apr/15 ]

Starting in 2.6.6, initial sync will abort if it encounters a BSON error reading documents from the sync source: SERVER-12061. There's a parameter to turn this into a non-fatal error (not recommended). Previous versions used to just log a message and skip the bad document.

In addition, every document read from the network goes through the checks performed by the --objcheck setting by default.

I believe this issue is a duplicate of SERVER-12061.

Generated at Thu Feb 08 03:45:57 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.