[SERVER-15351] Index data corruption on primary Created: 22/Sep/14  Updated: 20/Mar/15  Resolved: 20/Mar/15

Status: Closed
Project: Core Server
Component/s: Index Maintenance
Affects Version/s: 2.4.8
Fix Version/s: None

Type: Bug Priority: Critical - P2
Reporter: Dharshan Rangegowda Assignee: Bruce Lucas (Inactive)
Resolution: Cannot Reproduce Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File edspringload.ns     Text File mongod.log     File mongod.log.2014-09-21T01-00-31     File mongod.log.2014-09-22T01-01-10    
Backwards Compatibility: Fully Compatible
Operating System: ALL
Participants:

 Description   

Hi,

My primary has hit an error updating an index. No action was performed on our end other than inserting data. Please let me know what other data you need.

Thu Sep 18 20:14:02.446 [conn3290270] Assertion: 10295:getFile(): bad file number value (corrupt db?): run repair
0xc18f56 0xbe01f9 0x888852 0xa72322 0x93e439 0x9e9b47 0x9cab95 0x9cb4f1 0x95792f 0x95f41b 0x71a479 0xc060b9 0x7f75ea25c851 0x7f75e8fb690d
 /usr/bin/mongod(_ZN5mongo15printStackTraceERSo+0x26) [0xc18f56]
 /usr/bin/mongod(_ZN5mongo11msgassertedEiPKc+0xa9) [0xbe01f9]
 /usr/bin/mongod(_ZN5mongo8Database7getFileEiib+0x422) [0x888852]
 /usr/bin/mongod(_ZNK5mongo7DiskLoc3recEv+0x42) [0xa72322]
 /usr/bin/mongod(_ZN5mongo15getIndexChangesERSt6vectorINS_12IndexChangesESaIS1_EEPKcRNS_16NamespaceDetailsENS_7BSONObjES9_Rb+0x499) [0x93e439]
 /usr/bin/mongod(_ZN5mongo11DataFileMgr12updateRecordEPKcPNS_16NamespaceDetailsEPNS_25NamespaceDetailsTransientEPNS_6RecordERKNS_7DiskLocES2_iRNS_7OpDebugEb+
0x357) [0x9e9b47]
 /usr/bin/mongod(_ZN5mongo14_updateObjectsEbPKcRKNS_7BSONObjES4_bbbRNS_7OpDebugEPNS_11RemoveSaverEbRKNS_24QueryPlanSelectionPolicyEb+0x3035) [0x9cab95]
 /usr/bin/mongod(_ZN5mongo13updateObjectsEPKcRKNS_7BSONObjES4_bbbRNS_7OpDebugEbRKNS_24QueryPlanSelectionPolicyE+0xc1) [0x9cb4f1]
 /usr/bin/mongod(_ZN5mongo14receivedUpdateERNS_7MessageERNS_5CurOpE+0x53f) [0x95792f]
 /usr/bin/mongod(_ZN5mongo16assembleResponseERNS_7MessageERNS_10DbResponseERKNS_11HostAndPortE+0x89b) [0x95f41b]
 /usr/bin/mongod(_ZN5mongo16MyMessageHandler7processERNS_7MessageEPNS_21AbstractMessagingPortEPNS_9LastErrorE+0xd9) [0x71a479]
 /usr/bin/mongod(_ZN5mongo17PortMessageServer17handleIncomingMsgEPv+0x469) [0xc060b9]
 /lib64/libpthread.so.0(+0x7851) [0x7f75ea25c851]
 /lib64/libc.so.6(clone+0x6d) [0x7f75e8fb690d]
Thu Sep 18 20:14:02.459 [conn3290270] update edspringload.transferCrosswalk query: { edSpringMasterId: "TqYpFr4jQxpxWwW8n" } update: { orgid: "sD9g4BRjnsXfrM
dEf", identity_name_firstName: "Rebecca", identity_name_lastName: "reinig", identity_identification_zoneId: "2718", identity_identification_localId: "52", ed
SpringMasterId: "TqYpFr4jQxpxWwW8n", created: { on: 1411071243572.0, by: "SYSTEM" }, requiresManualMerge: true, candidates: {} } nscanned:1 keyUpdates:0 exce
ption: getFile(): bad file number value (corrupt db?): run repair code:10295 locks(micros) w:14959 14ms



 Comments   
Comment by Bruce Lucas (Inactive) [ 20/Mar/15 ]

Hi Dharshan,

We weren't able to reproduce this issue, and there wasn't enough information in the db files themselves to reconstruct what happened - for example whether it was a MongoDB issue or a platform issue (hardware, operating system). Since this occurred on an older version of MongoDB, I'll go ahead and close this ticket now; if you encounter this again, particularly on a newer version of MongoDB (we are currently shipping 3.0.1), please re-open the ticket.

Thanks,
Bruce

Comment by Dharshan Rangegowda [ 24/Sep/14 ]

Hi Bruce,

The DB has been running for well over 8 months now. As far as I know there were no administrative actions (e.g. Index building) happening on the server. I have attached the last three log files for the server. This is all the log files we have. Hopefully it will help.

Comment by Bruce Lucas (Inactive) [ 24/Sep/14 ]

Thanks Dharshan.

The .ns file is in an inconsistent state in at least two ways:

  • The .ns file entries for each of the 9 indexes on edspringload.transferCrosswalk have been deleted, yet those indexes are still listed in the system.indexes collection (as indicated by the result above of getIndexes(), which consults the system.indexes collection).
  • The .ns file indicates that an index build is in progress on some index, but the pointers in the .ns file to the information in the db files related to that index show file numbers of -2, which is the marker that mongod uses for an invalid or deleted index entry.

To try to understand how the .ns file may have gotten into that state I'll need some more information about the history of the instance:

  • The state of the .ns file suggest that the indexes were being rebuilt; did you do a reIndex() operation?
  • More generally, can you tell me about the timeline of that db - when was it created, how are you adding data, what other operations have you done?
  • Can you attach the log files for that instance, ideally going back to when the instance was first initialized if possible?

Thanks,
Bruce

Comment by Dharshan Rangegowda [ 23/Sep/14 ]

Hi Bruce,

The .ns file is attached. Please let me know what you find.

Comment by Bruce Lucas (Inactive) [ 23/Sep/14 ]

Hi Dharshan,

Can you share the edspringload.ns file with us so we can take a closer look at what's going wrong? It's small so you can just compress it and attach it to this ticket. It only contains metadata, so shouldn't reveal any more information than has already been posted on this ticket, but if you would be more comfortable we can provide a private location for you to upload it; just let me know.

Thanks,
Bruce

Comment by Dharshan Rangegowda [ 22/Sep/14 ]

Hi Ramon,

Ran your script. The collection which is having the index problems has 9 indexes.

edspringload.edspringload.transferCrosswalk : 9

However we have not added any indexes or have not experienced index addition failures - Is this the same issue?

cleanup.cleanup.assessmentRestoreIgdis20140128 : 6
cleanup.cleanup.assessmentRestoreaReading : 6
cleanup.cleanup.cleanupErrorAssessments : 1
cleanup.cleanup.dupEmail : 1
cleanup.cleanup.reimportSifRefIds : 1
cleanup.cleanup.studentMergeResults : 1
cleanup.cleanup.system.users : 2
cleanup.cleanup.userMerge : 1
edspringload.edspringload.DDPData : 3
edspringload.edspringload.SIFData : 4
edspringload.edspringload._observatory_logs : 1
edspringload.edspringload.absentScoreReasons : 1
edspringload.edspringload.activeSessions : 1
edspringload.edspringload.adminFunctions : 1
edspringload.edspringload.aeas : 1
edspringload.edspringload.assessmentExemptions : 1
edspringload.edspringload.assessmentScores : 7
edspringload.edspringload.assessmentStudentSnapshot : 2
edspringload.edspringload.assessments : 1
edspringload.edspringload.audit : 2
edspringload.edspringload.benchmarkPeriods : 1
edspringload.edspringload.benchmarkSchedules : 4
edspringload.edspringload.benchmarkWindows : 3
edspringload.edspringload.courses : 4
edspringload.edspringload.deletedEntities : 1
edspringload.edspringload.districts : 1
edspringload.edspringload.economicIndicators : 1
edspringload.edspringload.entityPermissions : 3
edspringload.edspringload.error_counts : 1
edspringload.edspringload.ethnicities : 1
edspringload.edspringload.exemptions : 1
edspringload.edspringload.familyLiteracyStories : 3
edspringload.edspringload.funders : 1
edspringload.edspringload.goalAreas : 1
edspringload.edspringload.gradeLevelProgressMonitoring : 3
edspringload.edspringload.grantees : 1
edspringload.edspringload.homeLanguages : 1
edspringload.edspringload.interventionPermissions : 1
edspringload.edspringload.interventionTime : 5
edspringload.edspringload.interventions : 2
edspringload.edspringload.locations : 1
edspringload.edspringload.loggingConfig : 1
edspringload.edspringload.menu : 1
edspringload.edspringload.monitorData : 2
edspringload.edspringload.news : 1
edspringload.edspringload.objectlabs-system : 1
edspringload.edspringload.objectlabs-system.admin.collections : 1
edspringload.edspringload.oldUsers : 13
edspringload.edspringload.oldbenchmarkSchedules : 2
edspringload.edspringload.oneClicks : 1
edspringload.edspringload.oneClicksOrders : 1
edspringload.edspringload.organizationGrades : 1
edspringload.edspringload.organizationPreferences : 1
edspringload.edspringload.organizations : 1
edspringload.edspringload.performanceLevels : 1
edspringload.edspringload.permissions : 1
edspringload.edspringload.progressMonitoringSchedules : 2
edspringload.edspringload.regions : 1
edspringload.edspringload.reportEngines : 1
edspringload.edspringload.reportEngines4 : 1
edspringload.edspringload.reports : 1
edspringload.edspringload.reports2 : 1
edspringload.edspringload.result1 : 1
edspringload.edspringload.rolePermissions : 1
edspringload.edspringload.roles : 1
edspringload.edspringload.schedule : 4
edspringload.edspringload.schoolYears : 1
edspringload.edspringload.sections : 4
edspringload.edspringload.serviceLog : 1
edspringload.edspringload.sifCrosswalk : 1
edspringload.edspringload.sites : 2
edspringload.edspringload.skills : 1
edspringload.edspringload.staffSectionEnrollments : 5
edspringload.edspringload.states : 1
edspringload.edspringload.studentGroups : 2
edspringload.edspringload.studentSectionEnrollments : 5
edspringload.edspringload.studentStatuses : 1
edspringload.edspringload.students : 11
edspringload.edspringload.system.users : 2
edspringload.edspringload.systemMetaData : 1
edspringload.edspringload.tags : 1
edspringload.edspringload.targetGrowthRates : 1
edspringload.edspringload.teacherCertifications : 3
edspringload.edspringload.terms : 3

edspringload.edspringload.users : 20
edspringload.edspringload.vendors : 1
admin.admin.system.users : 2
imports.imports.importRecords : 1
imports.imports.system.users : 2
reports.reports.iowaExport20131118 : 1
reports.reports.iowaExport20131118Errors : 1
reports.reports.iowaExport20131118Excludes : 1
reports.reports.iowaExport20131118Processed : 2
reports.reports.iowaExport20131212 : 1
reports.reports.iowaExport20131212Excludes : 1
reports.reports.iowaExport20131212Processed : 2
reports.reports.iowaExport2014winter : 1
reports.reports.iowaExport2014winterExcludes : 1
reports.reports.iowaExport2014winterProcessed : 2
reports.reports.serveStaffExport20140106 : 2
reports.reports.serveStaffExportSitesProcessed20140106 : 1
reports.reports.serveStudentsExport20131203 : 2
reports.reports.serveStudentsExport20140103 : 2
reports.reports.system.users : 2
reports.reports.updateStudentDataProcessed : 1
local.local.me : 1
local.local.replset.minvalid : 1
local.local.slaves : 1

Comment by Ramon Fernandez Marina [ 22/Sep/14 ]

Does any of your collection have 10 or more indexes? I'm asking because you may be running into SERVER-12481, which affects the version you're running.

You can use the following code snippet to see how many indexes are in each collection for all your databases:

db.getMongo().getDBNames().forEach(function (d) {
    var curr_db = db.getSiblingDB(d);
    curr_db.getCollectionNames().forEach(function(coll) {
        count = 0;
        var c = curr_db.getCollection(coll);
        c.getIndexes().forEach(function(index){
            count = count + 1;
        });
        if (count > 0)
            print (d + "." + c + " : " + count)
    });
})

Generated at Thu Feb 08 03:37:46 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.