[SERVER-16311] WiredTiger: lsm-worker: LSM metadata write: Cannot allocate memory Created: 25/Nov/14  Updated: 19/Jun/15  Resolved: 16/Jun/15

Status: Closed
Project: Core Server
Component/s: Storage, WiredTiger
Affects Version/s: 2.8.0-rc0
Fix Version/s: 3.1.4

Type: Bug Priority: Major - P3
Reporter: Henrik Ingo (Inactive) Assignee: Henrik Ingo (Inactive)
Resolution: Done Votes: 0
Labels: FT, dnsf, wttt
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File mongod-wt-lsm-2.log    
Issue Links:
Related
Backwards Compatibility: Fully Compatible
Operating System: ALL
Steps To Reproduce:

Start server

mongod --storageEngine wiredtiger --wiredTigerCollectionConfig "type=lsm" --wiredTigerIndexConfig "type=lsm" --logpath mongod-wt-lsm-2.log

Create an index

db.tstest.ensureIndex( { device_id : 1, ts : 1 } )

git clone https://github.com/henrikingo/mongo-timeseries-benchmark.git
cd mongo-timeseries-benchmark
python timeSeriesTest.py

After 2k to 5k iterations, mongod will assert with the shown error.

If it matters, I was running this in AWS on a cc2.8xlarge instance with the mongodb database on a single local/ephemeral disk.

Participants:

 Description   

I'm starting mongodb in WiredTiger type=lsm mode (both indexes and collections). After inserting between 200M to 500M rows, mongod crashes with " lsm-worker: LSM metadata write: Cannot allocate memory".

The test was run in a very IO constrained HW setup.



 Comments   
Comment by Henrik Ingo (Inactive) [ 16/Jun/15 ]

Sorry for the spam but the pedantic side of me needed to change resolution to fixed, otherwise I can't have peace with my soul. With the older version I can reproduce this just fine, so really it has been fixed at some point.

Comment by Michael Cahill (Inactive) [ 16/Jun/15 ]

Thanks for re-testing, henrik.ingo@10gen.com.

Comment by Henrik Ingo (Inactive) [ 15/Jun/15 ]

First test run completed fine with 3.1 nightly build from 2015-06-10. For added reassurance, I will run this test another 3-4 more times before closing the ticket, but for now I think you can assume the bug has been fixed.

For the record, the equivalent command line invocation for modern mongod was:
bin/mongod --storageEngine wiredTiger --wiredTigerCollectionConfigString "type=lsm" --wiredTigerIndexConfigString "type=lsm" --logpath mongod-wt-lsm-latest-20150610-3.log

Comment by Michael Cahill (Inactive) [ 10/Jun/15 ]

It shouldn't make much difference, but I'd suggest checking the latest nightly build first.

Comment by Henrik Ingo (Inactive) [ 10/Jun/15 ]

Status update: took me 2 attempts, but while I was partying in New York, the test has reproduced the crash on 2.8.0-rc0. Have been tied up with a customer case this week, but will get back to this tomorrow.

Would you like me to test 3.1.3 or Latest Nightly - does it matter?

Comment by Daniel Pasette (Inactive) [ 22/Dec/14 ]

Not sure if this was communicated, we've decided not to support LSM with the 2.8 release. This is still important to track down, but not for 2.8

Comment by Henrik Ingo (Inactive) [ 22/Dec/14 ]

For the record, still happens in 2.8.0-rc1.
Sorry, spoke too soon. I seem to have reached some other error on this run, possibly just ulimit or something external (session.create: too many open files).

Comment by Henrik Ingo (Inactive) [ 26/Nov/14 ]

dan@10gen.com: I will of course use RC1 the next time I continue with this test. Please be advised that it may be Dec 8 before I have a chance to do that. Note that the LSM test takes days to run, as it is the slowest of the three. (Of course, I don't need to stand by watching it while it runs, but...)

Comment by Daniel Pasette (Inactive) [ 26/Nov/14 ]

henrik, can you try to reproduce against 2.8.0-rc1?

Comment by Henrik Ingo (Inactive) [ 25/Nov/14 ]

Or at least similar. I'm also seeing a sudden slowdown in the benchmark before the crash so that looks similar. Don't know about the mongostat/vmstat as I've been a sleep both times that this has happened. Will share benchmark report in an hour or two.

Comment by Asya Kamsky [ 25/Nov/14 ]

Dup of SERVER-16123?

Generated at Thu Feb 08 03:40:38 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.