[SERVER-26656] On datafile allocation Created: 17/Oct/16  Updated: 18/Oct/16  Resolved: 17/Oct/16

Status: Closed
Project: Core Server
Component/s: Storage
Affects Version/s: 2.6.12
Fix Version/s: None

Type: Question Priority: Major - P3
Reporter: Aristarkh Zagorodnikov Assignee: Unassigned
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

MongoDB 2.6.12
Ubuntu 14.04 w/4.4 kernel
No virtualization, plain iron


Participants:

 Description   

We're running MongoDB 2.6.12 (yes, I know it's a fairly old version, but there is no real reason to upgrade for this instance), but I believe that the question might be there for modern versions as well.

While most of our databases are stored on SSDs, this particular cluster has only journals and config servers stored on SSDs, the datafiles themselves are located on nearline SAS spinning disks (spanning about 80 of them) to cut costs. The data is written in a "streaming" fashion – mostly the data is inserted, and occasionally (<1% of inserts) – deleted, so it's almost an append-only database with relatively low fragmentation.
Unfortunately we discovered that something that happens before datafile allocation leads to strange slowdowns. We use ext4 for old shards and xfs for new ones, but there is no real difference. Just before datafile is allocated, the usual sub-500msec insert time spikes to 5-25 seconds per operation. Is this an extent file management issue that tries to find space in previous extents before growing and our fragmentation analysis is wrong?

A typical excerpt from the log follows:

2016-10-17T13:18:25.797+0300 [FileAllocator] done allocating datafile /mnt/data5/a.1570, size: 2047MB,  took 0.092
 secs
2016-10-17T13:18:25.815+0300 [conn859391] insert a.fs.chunks query: { _id: ObjectId('5804a562f2ac4910f0306ed4'), files_id: 272245892
5734559744, n: 0, data: BinData(0, FFD8FFE000104A46494600010101000000000000FFDB00430002020202020102020202030202030306040303030307050
504060807090808070808090A0D0B090A0C0A08080B...) } ninserted:1 keyUpdates:0 numYields:0 locks(micros) w:15780713 15762ms
2016-10-17T13:18:25.815+0300 [conn859391] command a.$cmd command: insert { insert: "fs.chunks", documents: [ { _id: ObjectId('5804a5
62f2ac4910f0306ed4'), files_id: 2722458925734559744, n: 0, data: BinData(0, FFD8FFE000104A46494600010101000000000000FFDB004300020202
02020102020202030202030306040303030307050504060807090808070808090A0D0B090A0C0A08080B...) } ], writeConcern: { w: 1 }, ordered: true,
 metadata: { shardName: "driveFS-1", shardVersion: [ Timestamp 313546000|0, ObjectId('51793868331d54dfcf8e0032') ], session: 0 } } n
toreturn:1 keyUpdates:0 numYields:0 locks(micros) w:130 reslen:140 15762ms
2016-10-17T13:18:25.816+0300 [conn2108532] getmore local.oplog.rs query: { ts: { $gte: Timestamp 1476669897000|3 } } cursorid:188093
731175 ntoreturn:0 keyUpdates:0 numYields:1 locks(micros) r:336 nreturned:1 reslen:82619 16515ms
2016-10-17T13:18:25.816+0300 [conn2108409] getmore local.oplog.rs query: { ts: { $gte: Timestamp 1476669897000|3 } } cursorid:187931
282777 ntoreturn:0 keyUpdates:0 numYields:1 locks(micros) r:111 nreturned:1 reslen:82619 16515ms
2016-10-17T13:18:25.845+0300 [conn74996] insert a.fs.chunks query: { _id: ObjectId('5804a5631866cf15dcad4bcc'), files_id: 2145998173
431136256, n: 0, data: BinData(0, FFD8FFE000104A46494600010101000000000000FFDB004300020202020201020202020302020303060403030303070505
04060807090808070808090A0D0B090A0C0A08080B...) } ninserted:1 keyUpdates:0 numYields:0 locks(micros) w:45631 14147ms
2016-10-17T13:18:25.846+0300 [conn74996] command a.$cmd command: insert { insert: "fs.chunks", documents: [ { _id: ObjectId('5804a56
31866cf15dcad4bcc'), files_id: 2145998173431136256, n: 0, data: BinData(0, FFD8FFE000104A46494600010101000000000000FFDB0043000202020
2020102020202030202030306040303030307050504060807090808070808090A0D0B090A0C0A08080B...) } ], writeConcern: { w: 1 }, ordered: true,
metadata: { shardName: "driveFS-1", shardVersion: [ Timestamp 313546000|0, ObjectId('51793868331d54dfcf8e0032') ], session: 0 } } nt
oreturn:1 keyUpdates:0 numYields:0 locks(micros) w:274 reslen:140 14148ms
2016-10-17T13:18:25.882+0300 [conn1018147] insert a.fs.chunks query: { _id: ObjectId('5804a5656248ad146439e561'), files_id: 25783437
37658703872, n: 0, data: BinData(0, FFD8FFE000104A46494600010101000000000000FFDB0043000202020202010202020203020203030604030303030705
0504060807090808070808090A0D0B090A0C0A08080B...) } ninserted:1 keyUpdates:0 numYields:0 locks(micros) w:72293 12337ms
2016-10-17T13:18:25.883+0300 [conn1018147] command a.$cmd command: insert { insert: "fs.chunks", documents: [ { _id: ObjectId('5804a
5656248ad146439e561'), files_id: 2578343737658703872, n: 0, data: BinData(0, FFD8FFE000104A46494600010101000000000000FFDB00430002020
202020102020202030202030306040303030307050504060807090808070808090A0D0B090A0C0A08080B...) } ], writeConcern: { w: 1 }, ordered: true
, metadata: { shardName: "driveFS-1", shardVersion: [ Timestamp 313546000|0, ObjectId('51793868331d54dfcf8e0032') ], session: 0 } }
ntoreturn:1 keyUpdates:0 numYields:0 locks(micros) w:253 reslen:140 12338ms
2016-10-17T13:18:25.883+0300 [conn63069] insert a.fs.chunks query: { _id: ObjectId('5804a567f2ac4910f0306ef5'), files_id: 1425422233
051856896, n: 0, data: BinData(0, FFD8FFE000104A46494600010101000000000000FFDB004300020202020201020202020302020303060403030303070505
04060807090808070808090A0D0B090A0C0A08080B...) } ninserted:1 keyUpdates:0 numYields:0 locks(micros) w:465 10364ms
2016-10-17T13:18:25.883+0300 [conn63069] command a.$cmd command: insert { insert: "fs.chunks", documents: [ { _id: ObjectId('5804a56
7f2ac4910f0306ef5'), files_id: 1425422233051856896, n: 0, data: BinData(0, FFD8FFE000104A46494600010101000000000000FFDB0043000202020
2020102020202030202030306040303030307050504060807090808070808090A0D0B090A0C0A08080B...) } ], writeConcern: { w: 1 }, ordered: true,
metadata: { shardName: "driveFS-1", shardVersion: [ Timestamp 313546000|0, ObjectId('51793868331d54dfcf8e0032') ], session: 0 } } nt
oreturn:1 keyUpdates:0 numYields:0 locks(micros) w:64 reslen:140 10364ms
2016-10-17T13:18:26.328+0300 [conn859389] insert a.fs.chunks query: { _id: ObjectId('5804a5706cdf41115886dfb5'), files_id: 229011336
1506992128, n: 0, data: BinData(0, FFD8FFE000104A46494600010101000000000000FFDB00430002020202020102020202030202030306040303030307050
504060807090808070808090A0D0B090A0C0A08080B...) } ninserted:1 keyUpdates:0 numYields:0 locks(micros) w:102245 1636ms
2016-10-17T13:18:26.329+0300 [conn859389] command a.$cmd command: insert { insert: "fs.chunks", documents: [ { _id: ObjectId('5804a5
706cdf41115886dfb5'), files_id: 2290113361506992128, n: 0, data: BinData(0, FFD8FFE000104A46494600010101000000000000FFDB004300020202
02020102020202030202030306040303030307050504060807090808070808090A0D0B090A0C0A08080B...) } ], writeConcern: { w: 1 }, ordered: true,
 metadata: { shardName: "driveFS-1", shardVersion: [ Timestamp 313546000|0, ObjectId('51793868331d54dfcf8e0032') ], session: 0 } } n
toreturn:1 keyUpdates:0 numYields:0 locks(micros) w:243 reslen:140 1637ms



 Comments   
Comment by Aristarkh Zagorodnikov [ 18/Oct/16 ]

Excuse me, but I explicitly stated that the problem has nothing to do with the allocation of new file itself. Too bad you do not consider consistently triggered 50x latency degradation under stable load being important enough. The user group is a fine place to ask, but I really don't care that much about extents. I care about MongoDB reliability and performance. I get it that it's not 2011 an you don't have the need to look into random problems of random people, but this is important. As for mongodb-user group, excuse me, but I believe the best help I would get there is a link to a slide from one of the conferences/meetups, but I'm already familiar with that knowledge (I was in the masters program)

Comment by Ramon Fernandez Marina [ 17/Oct/16 ]

onyxmaster, I think a better forum for your question would be the mongodb-user group. A quick look at the sources shows the use of posix_fallocate(), but if you have further questions about extent management I'd suggest you post on the mongodb-dev user group.

Thanks,
Ramón.

Comment by Aristarkh Zagorodnikov [ 17/Oct/16 ]

I would like to stress, that it appears that the cause is not the preallocation, but some invisible process or action that occurs before it. I'm not a MongoDB storage expert, but I place my bets on the allocation strategy that walks existing extents before allocating a new one, when current extent is exhausted. If this is the case, I wonder if there is a way to change the allocation strategy.

Generated at Thu Feb 08 04:12:46 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.