[SERVER-12220] Crash with out-of-memory while building indexes on secondary Created: 31/Dec/13  Updated: 11/Jul/16  Resolved: 18/Mar/14

Status: Closed
Project: Core Server
Component/s: Index Maintenance, Replication
Affects Version/s: 2.4.8
Fix Version/s: None

Type: Bug Priority: Critical - P2
Reporter: nifan Assignee: Unassigned
Resolution: Done Votes: 0
Labels: indexing, replicaset, replication
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Description: Debian GNU/Linux 7.3 (wheezy)

ii mongodb-10gen 2.4.8 amd64 An object/document-oriented database


Attachments: Text File crash.txt     Text File crash2.txt    
Operating System: ALL
Participants:

 Description   

In a replicaSet we build a new background index on several collections on the primary.

db.foo.ensureIndex({_pm:1}, {name: 'foo_pm_idx', sparse: true, background: true })

For our smaller collections the index creation on the slave successed. (not sure if this matters)

However on our second biggest collection (the biggest one did succeed) it crashed our secondary.

crash.txt shows the first crash
crash2.txt shows the crash when starting up the secondary again.



 Comments   
Comment by Stennie Steneker (Inactive) [ 18/Mar/14 ]

Hi Nifan,

Thanks for confirming the resync resolved; I'll close this issue.

Regards,
Stephen

Comment by nifan [ 18/Mar/14 ]

After resync I have not seen the problem again so far.

Comment by Stennie Steneker (Inactive) [ 18/Mar/14 ]

Hi Nifan,

Apologies for the delay in followup. What is the current status of this issue .. are you still experiencing crashes, or did your resync of the servers resolve the problem?

Thanks,
Stephen

Comment by nifan [ 05/Jan/14 ]

There are no errors logged in dmesg or /var/log nor where there any reset/power-outages/etc that I know of.
Also ran fsck on the data partition which also did not reveal any errors.

To have our replicaset functioning again I rebuild the secondary from the master which did not yield any errors.

Later I steppedDown() the master and rebuild it as well just in case (and to free up a lot of disk space).
Which also succeeded without problems.

Comment by Eliot Horowitz (Inactive) [ 04/Jan/14 ]

This looks like that data on that secondary is corrupt.
Was there any system errors or other anomalies?
Have you tried resyncing?

Comment by nifan [ 31/Dec/13 ]

Where does the number of bytes to allocate come from ?

tcmalloc: large alloc 18446744073336291328 bytes == (nil) @ 

That would mean that mongodb is trying to allocate around 16777216TB worth of memory ?

Our total mongodb allocated size is about 1TB of data across all collections (including local), the biggest two collections being around 350 to 400GB. Both of these collections contain about 150 to 200GB worth of actual data. So the total actual (live data) size of our replicaset is around: 600GB out of that 1TB allocated.

No where near 16777216TB

Generated at Thu Feb 08 03:27:56 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.