[SERVER-10553] losing indexes on secondaries Created: 16/Aug/13  Updated: 10/Dec/14  Resolved: 04/Apr/14

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 2.2.3
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: charity majors Assignee: Unassigned
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

ubuntu


Operating System: ALL
Participants:

 Description   

I'm in the process of upgrading from 2.2.3 to 2.4.5. I upgraded one cluster last week and another this week, and both times, when I elected the new primary, it turned out to be missing all the indexes for at least a couple of collections.

Here's a collection on the old primary:

mongodata3:SECONDARY> db["app_22d9b327-2c33-4296-bb50-48b6917b356f:_Installation"].stats()
{
        "ns" : "appdata53.app_22d9b327-2c33-4296-bb50-48b6917b356f:_Installation",
        "count" : 2962834,
        "size" : 1011357036,
        "avgObjSize" : 341.3478568154679,
        "storageSize" : 1164914688,
        "numExtents" : 18,
        "nindexes" : 8,
        "lastExtentSize" : 307515392,
        "paddingFactor" : 1.0000000002533629,
        "systemFlags" : 1,
        "userFlags" : 0,
        "totalIndexSize" : 1238917456,
        "indexSizes" : {
                "_id_" : 123800992,
                "installationId_1_deviceToken_1" : 236245520,
                "installationId_1" : 231871360,
                "deviceToken_1" : 67214896,
                "_updated_at_1" : 142589440,
                "_created_at_1" : 96967360,
                "timeZone_1__created_at_1" : 236220992,
                "channels_1__created_at_1" : 104006896
        },
        "ok" : 1
}

and on the new one:

mongodata3:PRIMARY> db["app_22d9b327-2c33-4296-bb50-48b6917b356f:_Installation"].stats()
{
        "ns" : "appdata53.app_22d9b327-2c33-4296-bb50-48b6917b356f:_Installation",
        "count" : 2962834,
        "size" : 861644152,
        "avgObjSize" : 290.81755913426133,
        "storageSize" : 965894144,
        "numExtents" : 18,
        "nindexes" : 1,
        "lastExtentSize" : 256282624,
        "paddingFactor" : 1.0000000002360774,
        "systemFlags" : 1,
        "userFlags" : 0,
        "totalIndexSize" : 152073600,
        "indexSizes" : {
                "_id_" : 152073600
        },
        "ok" : 1
}

Not sure how many collections this has happened to, I only found these by visually scanning the logs for high nscanned numbers. I have tried grepping the logs for any indication of an index getting dropped, but there is nada.

I did run a compaction on all the collections a while back, but I checked for interrupted index builds at the time and didn't see anything. That's the only thing I can think of though, am I missing anything?



 Comments   
Comment by Thomas Rueckstiess [ 04/Apr/14 ]

Hi Charity,

I noticed this ticket was still marked as unresolved. Unfortunately we don't have enough information to continue the diagnosis at this stage. I'm closing this out now, but if it happens again please don't hesitate to let us know.

Thanks,
Thomas

Comment by Daniel Pasette (Inactive) [ 28/Aug/13 ]

If there was a problem during the compaction, your indexes may not have been rebuilt successfully. Without the logs it will be impossible to say for sure.

See: http://docs.mongodb.org/manual/reference/command/compact/#operation-termination

Comment by charity majors [ 28/Aug/13 ]

I don't think so. One of these clusters was previously upgraded to 2.4.4, but one was not.

Comment by Daniel Pasette (Inactive) [ 17/Aug/13 ]

Could this be continued fallout from SERVER-9856 and SERVER-9995?

Comment by charity majors [ 16/Aug/13 ]

I don't think I have the log files for when the index got dropped. I grepped pretty thoroughly. If it was logged at all, it was more than 20 days ago.

All I did was upgrade the secondary to 2.4.5, warm it up, and elect it primary. And at some point in the past, I compacted all the collections on that node. I no longer have the logs for the compaction process. I did grep for interrupted or killed compactions at the time, since I'd been bitten by that before.

Not sure if there is anything else I can give you here. :\

Comment by Daniel Pasette (Inactive) [ 16/Aug/13 ]

Hi Charity,
I realize the history of events leading to this is a big murky. Do you have log files for the servers impacted by this issue?

You mentioned that this happened while upgrading your servers and then promoting a new primary. If possible, can you enumerate the steps that were taken to get here? We should be able to see any DDL modifications logged if you have the history.

Generated at Thu Feb 08 03:23:28 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.