[SERVER-9995] corruption on primaries after upgrade to 2.4.4 Created: 23/Jun/13  Updated: 10/Dec/14  Resolved: 26/Jun/13

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 2.4.4
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: charity majors Assignee: Unassigned
Resolution: Duplicate Votes: 1
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
duplicates SERVER-9856 No check for building identical backg... Closed
Related
related to SERVER-10017 Remove "run repair" recommendation fr... Closed
Operating System: ALL
Participants:

 Description   

I'm currently running a handful of 2.2.4 clusters. On June 11th, I upgraded one primary and one secondary apiece on two clusters to 2.4.4.

A couple days later, I started getting reports of data corruption. Looking through my logs, I saw a ton of these on one of the 2.4.4 primaries:

Wed Jun 12 16:18:51.804 [conn1265710] getFile(): n=-2
Wed Jun 12 16:18:51.804 [conn1265710] Assertion: 10295:getFile(): bad file number value (corrupt db?): run repair
0xdd2331 0xd93c6b 0x8cdf45 0xb841e2 0x80996e 0xb3eca1 0xb49632 0xb4e6d1 0xb4ee8a 0xb51604 0xb51e56 0xb580c9 0xb5cd48 0xb5cf4e 0xa7f6da 0xa827d8 0x9f6059 0x9f7572 0x6e7978 0xdbea9e
 /usr/bin/mongod(_ZN5mongo15printStackTraceERSo+0x21) [0xdd2331]
 /usr/bin/mongod(_ZN5mongo11msgassertedEiPKc+0x9b) [0xd93c6b]
 /usr/bin/mongod(_ZN5mongo8Database7getFileEiib+0x395) [0x8cdf45]
 /usr/bin/mongod(_ZNK5mongo7DiskLoc3recEv+0x42) [0xb841e2]
 /usr/bin/mongod(_ZNK5mongo12IndexDetails10keyPatternEv+0x1e) [0x80996e]
 /usr/bin/mongod(_ZN5mongo16QueryUtilIndexed11indexUsefulERKNS_17FieldRangeSetPairEPNS_16NamespaceDetailsEiRKNS_7BSONObjE+0x41) [0xb3eca1]
 /usr/bin/mongod(_ZN5mongo18QueryPlanGenerator16addFallbackPlansEv+0x152) [0xb49632]
 /usr/bin/mongod(_ZN5mongo18QueryPlanGenerator15addInitialPlansEv+0x81) [0xb4e6d1]
 /usr/bin/mongod(_ZN5mongo12QueryPlanSet4makeEPKcSt8auto_ptrINS_17FieldRangeSetPairEES5_RKNS_7BSONObjES8_RKN5boost10shared_ptrIKNS_11ParsedQueryEEES8_NS_18QueryPlanGenerator18RecordedPlanPolicyES8_S8_b+0x10a) [0xb4ee8a]
 /usr/bin/mongod(_ZN5mongo16MultiPlanScanner4initERKNS_7BSONObjES3_S3_+0xf4) [0xb51604]
 /usr/bin/mongod(_ZN5mongo16MultiPlanScanner4makeERKNS_10StringDataERKNS_7BSONObjES6_RKN5boost10shared_ptrIKNS_11ParsedQueryEEES6_NS_18QueryPlanGenerator18RecordedPlanPolicyES6_S6_+0x76) [0xb51e56]
 /usr/bin/mongod(_ZN5mongo15CursorGenerator19setMultiPlanScannerEv+0xe9) [0xb580c9]
 /usr/bin/mongod(_ZN5mongo15CursorGenerator8generateEv+0x98) [0xb5cd48]
 /usr/bin/mongod(_ZN5mongo25NamespaceDetailsTransient9getCursorERKNS_10StringDataERKNS_7BSONObjES6_RKNS_24QueryPlanSelectionPolicyERKN5boost10shared_ptrIKNS_11ParsedQueryEEEbPNS_16QueryPlanSummaryE+0x3e) [0xb5cf4e]
 /usr/bin/mongod(_ZN5mongo23queryWithQueryOptimizerEiRKSsRKNS_7BSONObjERNS_5CurOpES4_S4_RKN5boost10shared_ptrINS_11ParsedQueryEEES4_RKNS_12ChunkVersionERNS7_10scoped_ptrINS_25PageFaultRetryableSectionEEERNSG_INS_19NoPageFaultsAllowedEEERNS_7MessageE+0x12a) [0xa7f6da]
 /usr/bin/mongod(_ZN5mongo8runQueryERNS_7MessageERNS_12QueryMessageERNS_5CurOpES1_+0x1ac8) [0xa827d8]
 /usr/bin/mongod() [0x9f6059]
 /usr/bin/mongod(_ZN5mongo16assembleResponseERNS_7MessageERNS_10DbResponseERKNS_11HostAndPortE+0x382) [0x9f7572]
 /usr/bin/mongod(_ZN5mongo16MyMessageHandler7processERNS_7MessageEPNS_21AbstractMessagingPortEPNS_9LastErrorE+0x98) [0x6e7978]
 /usr/bin/mongod(_ZN5mongo17PortMessageServer17handleIncomingMsgEPv+0x42e) [0xdbea9e]
Wed Jun 12 16:18:51.845 [conn1265710] assertion 10295 getFile(): bad file number value (corrupt db?): run repair ns:appdata84.app_35e3c2d0-297e-4725-8083-4af9a534c2a6:JobEmployee query:{ $query: {}, $orderby: { _created_at: -1 }, $maxScan: 9000000 }

I failed over to another primary (also 2.4.4), and within 4 days it started generating the same assertions.

Fri Jun 21 08:37:03.362 [conn774102]  appdata92 warning assertion failure n == 1 src/mongo/db/index.cpp 221
0xdd2331 0xd9217a 0x9ced75 0x9d71df 0x8e9a18 0x8d44fa 0x8d71e3 0x8d7ef2 0xa7d1c0 0xa81a8c 0x9f6059 0x9f7572 0x6e7978 0xdbea9e 0x7fd43601be9a 0x7fd43532ecbd
 /usr/bin/mongod(_ZN5mongo15printStackTraceERSo+0x21) [0xdd2331]
 /usr/bin/mongod(_ZN5mongo9wassertedEPKcS1_j+0x11a) [0xd9217a]
 /usr/bin/mongod(_ZN5mongo12IndexDetails8kill_idxEv+0xe75) [0x9ced75]
 /usr/bin/mongod(_ZN5mongo11dropIndexesEPNS_16NamespaceDetailsEPKcS3_RSsRNS_14BSONObjBuilderEb+0x74f) [0x9d71df]
 /usr/bin/mongod(_ZN5mongo14CmdDropIndexes3runERKSsRNS_7BSONObjEiRSsRNS_14BSONObjBuilderEb+0x7c8) [0x8e9a18]
 /usr/bin/mongod(_ZN5mongo12_execCommandEPNS_7CommandERKSsRNS_7BSONObjEiRSsRNS_14BSONObjBuilderEb+0x3a) [0x8d44fa]
 /usr/bin/mongod(_ZN5mongo7Command11execCommandEPS0_RNS_6ClientEiPKcRNS_7BSONObjERNS_14BSONObjBuilderEb+0x1023) [0x8d71e3]
 /usr/bin/mongod(_ZN5mongo12_runCommandsEPKcRNS_7BSONObjERNS_11_BufBuilderINS_16TrivialAllocatorEEERNS_14BSONObjBuilderEbi+0x5f2) [0x8d7ef2]
 /usr/bin/mongod(_ZN5mongo11runCommandsEPKcRNS_7BSONObjERNS_5CurOpERNS_11_BufBuilderINS_16TrivialAllocatorEEERNS_14BSONObjBuilderEbi+0x40) [0xa7d1c0]
 /usr/bin/mongod(_ZN5mongo8runQueryERNS_7MessageERNS_12QueryMessageERNS_5CurOpES1_+0xd7c) [0xa81a8c]
 /usr/bin/mongod() [0x9f6059]
 /usr/bin/mongod(_ZN5mongo16assembleResponseERNS_7MessageERNS_10DbResponseERKNS_11HostAndPortE+0x382) [0x9f7572]
 /usr/bin/mongod(_ZN5mongo16MyMessageHandler7processERNS_7MessageEPNS_21AbstractMessagingPortEPNS_9LastErrorE+0x98) [0x6e7978]
 /usr/bin/mongod(_ZN5mongo17PortMessageServer17handleIncomingMsgEPv+0x42e) [0xdbea9e]
 /lib/x86_64-linux-gnu/libpthread.so.0(+0x7e9a) [0x7fd43601be9a]
 /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7fd43532ecbd]
Fri Jun 21 08:37:03.461 [conn774102] Assertion: 10334:BSONObj size: 0 (0x00000000) is invalid. Size must be between 0 and 16793600(16MB) First element: EOO
0xdd2331 0xd93c6b 0xd941ac 0x6ebf2f 0x8099aa 0x9d7098 0x8e9a18 0x8d44fa 0x8d71e3 0x8d7ef2 0xa7d1c0 0xa81a8c 0x9f6059 0x9f7572 0x6e7978 0xdbea9e 0x7fd43601be9a 0x7fd43532ecbd
 /usr/bin/mongod(_ZN5mongo15printStackTraceERSo+0x21) [0xdd2331]
 /usr/bin/mongod(_ZN5mongo11msgassertedEiPKc+0x9b) [0xd93c6b]
 /usr/bin/mongod() [0xd941ac]
 /usr/bin/mongod(_ZNK5mongo7BSONObj14_assertInvalidEv+0x5bf) [0x6ebf2f]
 /usr/bin/mongod(_ZNK5mongo12IndexDetails10keyPatternEv+0x5a) [0x8099aa]
 /usr/bin/mongod(_ZN5mongo11dropIndexesEPNS_16NamespaceDetailsEPKcS3_RSsRNS_14BSONObjBuilderEb+0x608) [0x9d7098]
 /usr/bin/mongod(_ZN5mongo14CmdDropIndexes3runERKSsRNS_7BSONObjEiRSsRNS_14BSONObjBuilderEb+0x7c8) [0x8e9a18]
 /usr/bin/mongod(_ZN5mongo12_execCommandEPNS_7CommandERKSsRNS_7BSONObjEiRSsRNS_14BSONObjBuilderEb+0x3a) [0x8d44fa]
 /usr/bin/mongod(_ZN5mongo7Command11execCommandEPS0_RNS_6ClientEiPKcRNS_7BSONObjERNS_14BSONObjBuilderEb+0x1023) [0x8d71e3]
 /usr/bin/mongod(_ZN5mongo12_runCommandsEPKcRNS_7BSONObjERNS_11_BufBuilderINS_16TrivialAllocatorEEERNS_14BSONObjBuilderEbi+0x5f2) [0x8d7ef2]
 /usr/bin/mongod(_ZN5mongo11runCommandsEPKcRNS_7BSONObjERNS_5CurOpERNS_11_BufBuilderINS_16TrivialAllocatorEEERNS_14BSONObjBuilderEbi+0x40) [0xa7d1c0]
 /usr/bin/mongod(_ZN5mongo8runQueryERNS_7MessageERNS_12QueryMessageERNS_5CurOpES1_+0xd7c) [0xa81a8c]
 /usr/bin/mongod() [0x9f6059]
 /usr/bin/mongod(_ZN5mongo16assembleResponseERNS_7MessageERNS_10DbResponseERKNS_11HostAndPortE+0x382) [0x9f7572]
 /usr/bin/mongod(_ZN5mongo16MyMessageHandler7processERNS_7MessageEPNS_21AbstractMessagingPortEPNS_9LastErrorE+0x98) [0x6e7978]
 /usr/bin/mongod(_ZN5mongo17PortMessageServer17handleIncomingMsgEPv+0x42e) [0xdbea9e]
 /lib/x86_64-linux-gnu/libpthread.so.0(+0x7e9a) [0x7fd43601be9a]
 /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7fd43532ecbd]
 
 
Fri Jun 21 08:52:53.727 [conn773922] getFile(): n=-2
Fri Jun 21 08:52:53.727 [conn773922] Assertion: 10295:getFile(): bad file number value (corrupt db?): run repair
0xdd2331 0xd93c6b 0x8cdf45 0xb841e2 0x80996e 0xb3eca1 0xb49632 0xb4e6d1 0xb4ee8a 0xb51604 0xb51e56 0xb580c9 0xb5cd48 0xb5cf4e 0xa7f6da 0xa827d8 0x9f6059 0x9f7572 0x6e7978 0xdbea9e
 /usr/bin/mongod(_ZN5mongo15printStackTraceERSo+0x21) [0xdd2331]
 /usr/bin/mongod(_ZN5mongo11msgassertedEiPKc+0x9b) [0xd93c6b]
 /usr/bin/mongod(_ZN5mongo8Database7getFileEiib+0x395) [0x8cdf45]
 /usr/bin/mongod(_ZNK5mongo7DiskLoc3recEv+0x42) [0xb841e2]
 /usr/bin/mongod(_ZNK5mongo12IndexDetails10keyPatternEv+0x1e) [0x80996e]
 /usr/bin/mongod(_ZN5mongo16QueryUtilIndexed11indexUsefulERKNS_17FieldRangeSetPairEPNS_16NamespaceDetailsEiRKNS_7BSONObjE+0x41) [0xb3eca1]
 /usr/bin/mongod(_ZN5mongo18QueryPlanGenerator16addFallbackPlansEv+0x152) [0xb49632]
 /usr/bin/mongod(_ZN5mongo18QueryPlanGenerator15addInitialPlansEv+0x81) [0xb4e6d1]
 /usr/bin/mongod(_ZN5mongo12QueryPlanSet4makeEPKcSt8auto_ptrINS_17FieldRangeSetPairEES5_RKNS_7BSONObjES8_RKN5boost10shared_ptrIKNS_11ParsedQueryEEES8_NS_18QueryPlanGenerator18RecordedPlanPolicyES8_S8_b+0x10a) [0xb4ee8a]
 /usr/bin/mongod(_ZN5mongo16MultiPlanScanner4initERKNS_7BSONObjES3_S3_+0xf4) [0xb51604]
 /usr/bin/mongod(_ZN5mongo16MultiPlanScanner4makeERKNS_10StringDataERKNS_7BSONObjES6_RKN5boost10shared_ptrIKNS_11ParsedQueryEEES6_NS_18QueryPlanGenerator18RecordedPlanPolicyES6_S6_+0x76) [0xb51e56]
 /usr/bin/mongod(_ZN5mongo15CursorGenerator19setMultiPlanScannerEv+0xe9) [0xb580c9]
 /usr/bin/mongod(_ZN5mongo15CursorGenerator8generateEv+0x98) [0xb5cd48]
 /usr/bin/mongod(_ZN5mongo25NamespaceDetailsTransient9getCursorERKNS_10StringDataERKNS_7BSONObjES6_RKNS_24QueryPlanSelectionPolicyERKN5boost10shared_ptrIKNS_11ParsedQueryEEEbPNS_16QueryPlanSummaryE+0x3e) [0xb5cf4e]
 /usr/bin/mongod(_ZN5mongo23queryWithQueryOptimizerEiRKSsRKNS_7BSONObjERNS_5CurOpES4_S4_RKN5boost10shared_ptrINS_11ParsedQueryEEES4_RKNS_12ChunkVersionERNS7_10scoped_ptrINS_25PageFaultRetryableSectionEEERNSG_INS_19NoPageFaultsAllowedEEERNS_7MessageE+0x12a) [0xa7f6da]
 /usr/bin/mongod(_ZN5mongo8runQueryERNS_7MessageERNS_12QueryMessageERNS_5CurOpES1_+0x1ac8) [0xa827d8]
 /usr/bin/mongod() [0x9f6059]
 /usr/bin/mongod(_ZN5mongo16assembleResponseERNS_7MessageERNS_10DbResponseERKNS_11HostAndPortE+0x382) [0x9f7572]
 /usr/bin/mongod(_ZN5mongo16MyMessageHandler7processERNS_7MessageEPNS_21AbstractMessagingPortEPNS_9LastErrorE+0x98) [0x6e7978]
 /usr/bin/mongod(_ZN5mongo17PortMessageServer17handleIncomingMsgEPv+0x42e) [0xdbea9e]
Fri Jun 21 08:52:53.734 [conn773922] assertion 10295 getFile(): bad file number value (corrupt db?): run repair ns:appdata92.app_df7688a2-419b-470e-8adb-f1071e960753:Answer query:{ $query: {}, $orderby: { _created_at: -1 }, $maxScan: 9000000 }

I just checked on my other 2.4.4 primary on a totally different replica set, and sure enough, it has a shitload of "corrupt db" errors too. Thousands.

asya suggested this may be due to index corruption, not data corruption, so I'm going to try rebuilding the indexes on these nodes once I can take them offline.



 Comments   
Comment by Daniel Pasette (Inactive) [ 26/Jun/13 ]

marking as duplicate of SERVER-9856

Comment by charity majors [ 25/Jun/13 ]

Ok, thanks. I look forward to re-attempting an upgrade when 2.4.5 is out.

Comment by David Hows [ 25/Jun/13 ]

Hi Charity,

I've gone over the logs you posted and found that these issues are SERVER-9856 as suggested by Asya.

From your logs I can see (for example) that you create two indexes on collection appdata45.app_cf6a4e7c-9303-4928-972d-8c82ca4bf973:TestTable

Mon Jun 17 16:57:56.375 [conn594830] build index appdata45.app_cf6a4e7c-9303-4928-972d-8c82ca4bf973:TestTable { _id: 1 }
Mon Jun 17 16:59:09.781 [conn613585] build index appdata45.app_cf6a4e7c-9303-4928-972d-8c82ca4bf973:TestTable { inedx: 1 } background
Mon Jun 17 16:59:09.795 [conn613586] build index appdata45.app_cf6a4e7c-9303-4928-972d-8c82ca4bf973:TestTable { inedx: 1 } background
Mon Jun 17 16:59:09.948 [conn613585] build index done.  scanned 1250 total records. 0.167 secs
Mon Jun 17 16:59:09.963 [conn613586] build index done.  scanned 1250 total records. 0.168 secs

This works fine until you drop the indexes on that collection. The drop will remove one of the index entries and the index itself, but the second instance of the

{ inedx: 1 }

index will remain. This leaves you with an index that points at an empty space.

Mon Jun 17 17:09:49.053 [conn602756] CMD: dropIndexes appdata45.app_cf6a4e7c-9303-4928-972d-8c82ca4bf973:TestTable
Mon Jun 17 17:09:49.072 [conn602756]  appdata45 warning assertion failure n == 1 src/mongo/db/index.cpp 221
0xdd2331 0xd9217a 0x9ced75 0x9d71df 0x8e9a18 0x8d44fa 0x8d71e3 0x8d7ef2 0xa7d1c0 0xa81a8c 0x9f6059 0x9f7572 0x6e7978 0xdbea9e 0x7f5addf8fe9a 0x7f5add2a2cbd
 /usr/bin/mongod(_ZN5mongo15printStackTraceERSo+0x21) [0xdd2331]
 /usr/bin/mongod(_ZN5mongo9wassertedEPKcS1_j+0x11a) [0xd9217a]
 /usr/bin/mongod(_ZN5mongo12IndexDetails8kill_idxEv+0xe75) [0x9ced75]
 /usr/bin/mongod(_ZN5mongo11dropIndexesEPNS_16NamespaceDetailsEPKcS3_RSsRNS_14BSONObjBuilderEb+0x74f) [0x9d71df]
 /usr/bin/mongod(_ZN5mongo14CmdDropIndexes3runERKSsRNS_7BSONObjEiRSsRNS_14BSONObjBuilderEb+0x7c8) [0x8e9a18]
 /usr/bin/mongod(_ZN5mongo12_execCommandEPNS_7CommandERKSsRNS_7BSONObjEiRSsRNS_14BSONObjBuilderEb+0x3a) [0x8d44fa]
 /usr/bin/mongod(_ZN5mongo7Command11execCommandEPS0_RNS_6ClientEiPKcRNS_7BSONObjERNS_14BSONObjBuilderEb+0x1023) [0x8d71e3]
 /usr/bin/mongod(_ZN5mongo12_runCommandsEPKcRNS_7BSONObjERNS_11_BufBuilderINS_16TrivialAllocatorEEERNS_14BSONObjBuilderEbi+0x5f2) [0x8d7ef2]
 /usr/bin/mongod(_ZN5mongo11runCommandsEPKcRNS_7BSONObjERNS_5CurOpERNS_11_BufBuilderINS_16TrivialAllocatorEEERNS_14BSONObjBuilderEbi+0x40) [0xa7d1c0]
 /usr/bin/mongod(_ZN5mongo8runQueryERNS_7MessageERNS_12QueryMessageERNS_5CurOpES1_+0xd7c) [0xa81a8c]
 /usr/bin/mongod() [0x9f6059]
 /usr/bin/mongod(_ZN5mongo16assembleResponseERNS_7MessageERNS_10DbResponseERKNS_11HostAndPortE+0x382) [0x9f7572]
 /usr/bin/mongod(_ZN5mongo16MyMessageHandler7processERNS_7MessageEPNS_21AbstractMessagingPortEPNS_9LastErrorE+0x98) [0x6e7978]
 /usr/bin/mongod(_ZN5mongo17PortMessageServer17handleIncomingMsgEPv+0x42e) [0xdbea9e]
 /lib/x86_64-linux-gnu/libpthread.so.0(+0x7e9a) [0x7f5addf8fe9a]
 /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f5add2a2cbd]
Mon Jun 17 17:09:49.183 [conn602756] Assertion: 10334:BSONObj size: 0 (0x00000000) is invalid. Size must be between 0 and 16793600(16MB) First element: EOO
0xdd2331 0xd93c6b 0xd941ac 0x6ebf2f 0x8099aa 0x9d7098 0x8e9a18 0x8d44fa 0x8d71e3 0x8d7ef2 0xa7d1c0 0xa81a8c 0x9f6059 0x9f7572 0x6e7978 0xdbea9e 0x7f5addf8fe9a 0x7f5add2a2cbd
 /usr/bin/mongod(_ZN5mongo15printStackTraceERSo+0x21) [0xdd2331]
 /usr/bin/mongod(_ZN5mongo11msgassertedEiPKc+0x9b) [0xd93c6b]
 /usr/bin/mongod() [0xd941ac]
 /usr/bin/mongod(_ZNK5mongo7BSONObj14_assertInvalidEv+0x5bf) [0x6ebf2f]
 /usr/bin/mongod(_ZNK5mongo12IndexDetails10keyPatternEv+0x5a) [0x8099aa]
 /usr/bin/mongod(_ZN5mongo11dropIndexesEPNS_16NamespaceDetailsEPKcS3_RSsRNS_14BSONObjBuilderEb+0x608) [0x9d7098]
 /usr/bin/mongod(_ZN5mongo14CmdDropIndexes3runERKSsRNS_7BSONObjEiRSsRNS_14BSONObjBuilderEb+0x7c8) [0x8e9a18]
 /usr/bin/mongod(_ZN5mongo12_execCommandEPNS_7CommandERKSsRNS_7BSONObjEiRSsRNS_14BSONObjBuilderEb+0x3a) [0x8d44fa]
 /usr/bin/mongod(_ZN5mongo7Command11execCommandEPS0_RNS_6ClientEiPKcRNS_7BSONObjERNS_14BSONObjBuilderEb+0x1023) [0x8d71e3]
 /usr/bin/mongod(_ZN5mongo12_runCommandsEPKcRNS_7BSONObjERNS_11_BufBuilderINS_16TrivialAllocatorEEERNS_14BSONObjBuilderEbi+0x5f2) [0x8d7ef2]
 /usr/bin/mongod(_ZN5mongo11runCommandsEPKcRNS_7BSONObjERNS_5CurOpERNS_11_BufBuilderINS_16TrivialAllocatorEEERNS_14BSONObjBuilderEbi+0x40) [0xa7d1c0]
 /usr/bin/mongod(_ZN5mongo8runQueryERNS_7MessageERNS_12QueryMessageERNS_5CurOpES1_+0xd7c) [0xa81a8c]
 /usr/bin/mongod() [0x9f6059]
 /usr/bin/mongod(_ZN5mongo16assembleResponseERNS_7MessageERNS_10DbResponseERKNS_11HostAndPortE+0x382) [0x9f7572]
 /usr/bin/mongod(_ZN5mongo16MyMessageHandler7processERNS_7MessageEPNS_21AbstractMessagingPortEPNS_9LastErrorE+0x98) [0x6e7978]
 /usr/bin/mongod(_ZN5mongo17PortMessageServer17handleIncomingMsgEPv+0x42e) [0xdbea9e]
 /lib/x86_64-linux-gnu/libpthread.so.0(+0x7e9a) [0x7f5addf8fe9a]
 /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f5add2a2cbd]
Mon Jun 17 17:09:49.212 [conn602756] command appdata45.$cmd command: { deleteIndexes: "app_cf6a4e7c-9303-4928-972d-8c82ca4bf973:TestTable", index: "*" } ntoreturn:1 keyUpdates:0 locks(micros) w:159452 reslen:193 159ms

Then from that point on all of the queries to collection this will fail when they attempt to use that index.

Mon Jun 17 17:10:43.425 [conn451658] getFile(): n=-2
Mon Jun 17 17:10:43.425 [conn451658] Assertion: 10295:getFile(): bad file number value (corrupt db?): run repair
0xdd2331 0xd93c6b 0x8cdf45 0xb841e2 0x80996e 0xb3eca1 0xb49632 0xb4e6d1 0xb4ee8a 0xb51604 0xb51e56 0xb580c9 0xb5cd48 0xb5cf4e 0xa7f6da 0xa827d8 0x9f6059 0x9f7572 0x6e7978 0xdbea9e
Mon Jun 17 17:10:43.427 [initandlisten] connection accepted from 10.85.18.141:44635 #614954 (3773 connections now open)
 /usr/bin/mongod(_ZN5mongo15printStackTraceERSo+0x21) [0xdd2331]
 /usr/bin/mongod(_ZN5mongo11msgassertedEiPKc+0x9b) [0xd93c6b]
 /usr/bin/mongod(_ZN5mongo8Database7getFileEiib+0x395) [0x8cdf45]
 /usr/bin/mongod(_ZNK5mongo7DiskLoc3recEv+0x42) [0xb841e2]
 /usr/bin/mongod(_ZNK5mongo12IndexDetails10keyPatternEv+0x1e) [0x80996e]
 /usr/bin/mongod(_ZN5mongo16QueryUtilIndexed11indexUsefulERKNS_17FieldRangeSetPairEPNS_16NamespaceDetailsEiRKNS_7BSONObjE+0x41) [0xb3eca1]
 /usr/bin/mongod(_ZN5mongo18QueryPlanGenerator16addFallbackPlansEv+0x152) [0xb49632]
 /usr/bin/mongod(_ZN5mongo18QueryPlanGenerator15addInitialPlansEv+0x81) [0xb4e6d1]
 /usr/bin/mongod(_ZN5mongo12QueryPlanSet4makeEPKcSt8auto_ptrINS_17FieldRangeSetPairEES5_RKNS_7BSONObjES8_RKN5boost10shared_ptrIKNS_11ParsedQueryEEES8_NS_18QueryPlanGenerator18RecordedPlanPolicyES8_S8_b+0x10a) [0xb4ee8a]
 /usr/bin/mongod(_ZN5mongo16MultiPlanScanner4initERKNS_7BSONObjES3_S3_+0xf4) [0xb51604]
 /usr/bin/mongod(_ZN5mongo16MultiPlanScanner4makeERKNS_10StringDataERKNS_7BSONObjES6_RKN5boost10shared_ptrIKNS_11ParsedQueryEEES6_NS_18QueryPlanGenerator18RecordedPlanPolicyES6_S6_+0x76) [0xb51e56]
 /usr/bin/mongod(_ZN5mongo15CursorGenerator19setMultiPlanScannerEv+0xe9) [0xb580c9]
 /usr/bin/mongod(_ZN5mongo15CursorGenerator8generateEv+0x98) [0xb5cd48]
 /usr/bin/mongod(_ZN5mongo25NamespaceDetailsTransient9getCursorERKNS_10StringDataERKNS_7BSONObjES6_RKNS_24QueryPlanSelectionPolicyERKN5boost10shared_ptrIKNS_11ParsedQueryEEEbPNS_16QueryPlanSummaryE+0x3e) [0xb5cf4e]
 /usr/bin/mongod(_ZN5mongo23queryWithQueryOptimizerEiRKSsRKNS_7BSONObjERNS_5CurOpES4_S4_RKN5boost10shared_ptrINS_11ParsedQueryEEES4_RKNS_12ChunkVersionERNS7_10scoped_ptrINS_25PageFaultRetryableSectionEEERNSG_INS_19NoPageFaultsAllowedEEERN
S_7MessageE+0x12a) [0xa7f6da]
 /usr/bin/mongod(_ZN5mongo8runQueryERNS_7MessageERNS_12QueryMessageERNS_5CurOpES1_+0x1ac8) [0xa827d8]
 /usr/bin/mongod() [0x9f6059]
 /usr/bin/mongod(_ZN5mongo16assembleResponseERNS_7MessageERNS_10DbResponseERKNS_11HostAndPortE+0x382) [0x9f7572]
 /usr/bin/mongod(_ZN5mongo16MyMessageHandler7processERNS_7MessageEPNS_21AbstractMessagingPortEPNS_9LastErrorE+0x98) [0x6e7978]
 /usr/bin/mongod(_ZN5mongo17PortMessageServer17handleIncomingMsgEPv+0x42e) [0xdbea9e]
Mon Jun 17 17:10:43.453 [conn451658] assertion 10295 getFile(): bad file number value (corrupt db?): run repair ns:appdata45.app_cf6a4e7c-9303-4928-972d-8c82ca4bf973:TestTable query:{ $query: {}, $orderby: { _created_at: -1 }, $maxScan: 9000000 }

Given this the steps to recover are:

  1. Correct the behaviour that introduced duplicate indexes being created by your application
  2. Step down the primary and re-sync it from one of the secondaries.

If you are unable to change the behaviour in your application then the recommended action is to rollback to 2.2 before stepping down the primary and re-syncing.

Regards,
David

Comment by charity majors [ 24/Jun/13 ]

Yes, thanks, I was afk. Uploaded.

Comment by Daniel Pasette (Inactive) [ 24/Jun/13 ]

Hi Charity, not sure if you received notification, but I created a private ticket for you to upload logfiles here: SUPPORT-615.

Comment by charity majors [ 24/Jun/13 ]

Yes, I can compress and upload, but we would like to keep them private.

Comment by Asya Kamsky [ 24/Jun/13 ]

I'm about 80% certain that you are hitting SERVER-9856 - since your indexes are dynamically generated it's very possible that two identical index builds were kicked off at the same time.

Comment by Daniel Pasette (Inactive) [ 24/Jun/13 ]

Do you have the complete log file from one of the nodes starting from the time you upgraded until you started seeing the issues? If so, can you compress and upload? If you prefer to keep this log information private, I can start a SUPPORT ticket for this issue.

Comment by charity majors [ 24/Jun/13 ]

Yes, I was able to run getIndexes() on at least one of the impacted collections.

parse2:PRIMARY> db["app_df7688a2-419b-470e-8adb-f1071e960753:Answer"].getIndexes()
[
{
"v" : 1,
"key" :

{ "_id" : 1 }

,
"ns" : "appdata92.app_df7688a2-419b-470e-8adb-f1071e960753:Answer",
"name" : "id"
},
{
"v" : 1,
"key" :

{ "_p_child" : 1 }

,
"ns" : "appdata92.app_df7688a2-419b-470e-8adb-f1071e960753:Answer",
"name" : "_p_child_1",
"background" : true
},
{
"v" : 1,
"key" :

{ "vote" : 1 }

,
"ns" : "appdata92.app_df7688a2-419b-470e-8adb-f1071e960753:Answer",
"name" : "vote_1",
"background" : true
}
]

Comment by Daniel Pasette (Inactive) [ 24/Jun/13 ]

Can you run getIndexes() on the impacted collections? Trying to see if this is fallout from SERVER-9856.

Comment by charity majors [ 24/Jun/13 ]

I was unable to drop the indexes.

parse2:PRIMARY> db["app_df7688a2-419b-470e-8adb-f1071e960753:Answer"].dropIndex("_p_child_1")
{
"errmsg" : "exception: getFile(): bad file number value (corrupt db?): run repair",
"code" : 10295,
"ok" : 0
}
parse2:PRIMARY> db["app_df7688a2-419b-470e-8adb-f1071e960753:Answer"].dropIndex("vote_1")
{
"errmsg" : "exception: getFile(): bad file number value (corrupt db?): run repair",
"code" : 10295,
"ok" : 0
}

I'm rolling back to 2.2 and going to repair these nodes.

Generated at Thu Feb 08 03:21:59 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.