[SERVER-32701] Mongod with MMAPV1 SE does not use all free memory and cause disk IO Created: 15/Jan/18  Updated: 02/Apr/18  Resolved: 06/Mar/18

Status: Closed
Project: Core Server
Component/s: MMAPv1, Performance
Affects Version/s: 3.4.10
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Ali Assignee: Kelsey Schubert
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: Zip Archive diagnostic.data.zip     File diagnostic_data.tar.gz     File mongod_logs.tar.gz    
Operating System: ALL
Participants:

 Description   

MongoDB server version: 3.4.10
OS: Debian 4.9.65-3+deb9u1~bpo8+1 (2017-12-23) x86_64 GNU/Linux
RAM: 32GB
CPU: 12 cores
Database size: 61GB (no replication, no sharding)
Collection records: 32 Millions (only one collection in the database)

On server peak hours something hinky happens. Some queries on the database takes from 10s to 50s! When I get htop, all of a sudden can see that mongod has used ONLY 40% of server RAM and about 14GB of memory is free, cpu usage is used about 10% to 20%! When I see those queries I can see that it has used `IXSCAN` as its winning plan and fetched like 4000 records which belongs to the user, which is normal.

Why `mongod` has not used all of the memory and causes disk IO? It has been mentioned in the documentation that mongod will consume all the memory on the server!

I should note that I have benchmarked disk IO using `hdparm` and result was like below:

root@hs1.example.com:~# hdparm -tT /dev/mapper/BDRS--CH01--001--vg-VAR
/dev/mapper/BDRS--CH01--001--vg-VAR:
 Timing cached reads:   17388 MB in  2.00 seconds = 8706.52 MB/sec
 Timing buffered disk reads: 2722 MB in  3.00 seconds = 907.05 MB/sec
 
root@hs1.example.com:~# hdparm -tT /dev/mapper/BDRS--CH01--001--vg-VAR
/dev/mapper/BDRS--CH01--001--vg-VAR:
 Timing cached reads:   17022 MB in  2.00 seconds = 8523.85 MB/sec
 Timing buffered disk reads: 2054 MB in  3.01 seconds = 683.17 MB/sec
 
root@hs1.example.com:~# hdparm -tT /dev/mapper/BDRS--CH01--001--vg-VAR
/dev/mapper/BDRS--CH01--001--vg-VAR:
 Timing cached reads:   17732 MB in  2.00 seconds = 8879.00 MB/sec
 Timing buffered disk reads: 3298 MB in  3.00 seconds = 1099.23 MB/sec

This misbehaviour seems as a bug not normal behaviour.



 Comments   
Comment by Kelsey Schubert [ 06/Mar/18 ]

Hi alireza_stack,

Thanks for the update. Please note that the SERVER project is for reporting bugs or feature suggestions for the MongoDB server. For MongoDB-related support discussion please post on the mongodb-user group or Stack Overflow with the mongodb tag. Questions like this involving more discussion would be best posted on the mongodb-users group.

Kind regards,
Kelsey

Comment by Ali [ 21/Feb/18 ]

The free memory that I previously explained was somehow wrong! I totaled the RAM with its cache while just `free -h` gives a result like below:

root@sr1.my_server:~# free -h
                      total       used       free     shared    buffers     cached
Mem:           31G        31G       267M       688M        72M        26G
-/+ buffers/cache:       4.8G        26G
Swap:          31G       695M        31G

From the above result we can see that only 267MB is free and 26 GB of Memory has been used by applications like `Mongo` and other services. So is the slowness related to memory limitation? Or if I use `SSD RAID 10` rather than my current setup `SATA RAID 6` would improve performance dramatically? Does `WiredTiger` with compression level of `zlib` would make performance much better than `MMAPV1` as by compression it has to fetch less data from disk?

The query result is big, does filtering data by projection helps here? Or helps the data size in memory?

Comment by Ali [ 14/Feb/18 ]

I have upgraded to 3.6 and I still have the same problem that some queries take about 12 Seconds! Look at the sample slow query below:

{
    "host": "MY_SERVER:27017",
    "desc": "conn",
    "threadId": "139762203059968",
    "connectionId": 55021,
    "client": "CLIENT_IP:60844",
    "clientMetadata": {
        "driver": {
            "name": "PyMongo",
            "version": "3.4.0"
        },
        "os": {
            "type": "Linux",
            "name": "debian 8.9",
            "architecture": "x86_64",
            "version": "4.9.0-0.bpo.4-amd64"
        },
        "platform": "CPython 2.7.14.final.0"
    },
    "active": "true",
    "currentOpTime": "2018-02-14T06:09:03.324+0000",
    "opid": 2157733,
    "secs_running": 12,
    "microsecs_running": 12157754,
    "op": "query",
    "ns": "cubano.cdr",
    "command": {
        "find": "cdr",
        "filter": {
            "$and": [
                {
                    "$or": [
                        {
                            "track.0.data.spam.user_spam": {
                                "$exists": false
                            }
                        },
                        {
                            "track.0.data.spam.user_spam": false
                        }
                    ]
                }
            ],
            "is_deleted": false,
            "track.0.direction": "incoming",
            "date": {
                "$gt": "1970-01-01 00:00:00",
                "$lt": "2018-02-14 06:08:51"
            },
            "track.0.data.status": {
                "$in": [
                    "success",
                    "partially_received"
                ]
            },
            "track.0.type": "APPLICATION_TYPE",
            "owner_id": "00000000000040008000000000002b41"
        },
        "sort": {
            "date": -1
        },
        "limit": 21,
        "$readPreference": {
            "mode": "secondaryPreferred"
        },
        "$db": "MY_DATABASE"
    },
    "numYields": 1119,
    "locks": {
        "Global": "r",
        "Database": "r",
        "Collection": "R"
    },
    "waitingForLock": "true",
    "lockStats": {
        "Global": {
            "acquireCount": {
                "r": 2240
            }
        },
        "MMAPV1Journal": {
            "acquireCount": {
                "r": 1128
            },
            "acquireWaitCount": {
                "r": 3
            },
            "timeAcquiringMicros": {
                "r": 657513
            }
        },
        "Database": {
            "acquireCount": {
                "r": 1120
            }
        },
        "Collection": {
            "acquireCount": {
                "R": 1120
            },
            "acquireWaitCount": {
                "R": 9
            },
            "timeAcquiringMicros": {
                "R": 1014598
            }
        },
        "Mutex": {
            "acquireCount": {
                "r": 1096
            }
        }
    }
}

I still have this question that why MMAPV1 does not consume all of the free memory in our server?

Comment by Mark Agarunov [ 05/Feb/18 ]

Hello alireza_stack,

Thank you for providing this information. Looking over the logs and diagnostic data, I'm not seeing anything that would indicate a bug. The memory usage increases as expected, and the high disk io you are seeing is data being read off the disk that has not yet been read into memory. It appears that the node is hitting a disk io bottleneck which is causing the behavior you're seeing. Note that the hdparm output you provided is measuring sequential cached (already in memory) and buffered reads, which does not match the general workload or access pattern of a database.

I believe that upgrading to the latest version of MongoDB should help this issue, as there have been many fixes related to this in more recent versions.

Thanks,
Mark

Comment by Ali [ 03/Feb/18 ]

Does upgrading to latest version of MongoDB might solve the issue?

Comment by Ali [ 23/Jan/18 ]

Diagnostic data is attached which is in the same day with logs (22nd of Jan.) diagnostic.data.zip

Comment by Mark Agarunov [ 22/Jan/18 ]

Hello alireza_stack,

Thank you for providing the logs. So that we can correlate the logs with the diagnostic data, could you please also provide the diagnostic.data directory for the same timeframe as the logs you've provided?

Thanks
Mark

Comment by Ali [ 22/Jan/18 ]

I have enabled MongoDB log and have attached logs. mongod_logs.tar.gz

Comment by Mark Agarunov [ 19/Jan/18 ]

Hello alireza_stack,

Thank you for providing this data. After looking over this, unfortunately I believe we'll need the logs from mongod to properly diagnose this issue as the logs contain the data needed to correlate the slow queries with the server state.

Thanks,
Mark

Comment by Ali [ 17/Jan/18 ]

I have attached diagnostic data. Unfortunately mongodb log was disabled and couldn't provide a log file here. If it is really needed and diagnostic data is not enough, I would enable logging on the server.

Comment by Mark Agarunov [ 16/Jan/18 ]

Hello alireza_stack,

Thank you for the report. To get a better idea of what may be causing this, could you please provide the following:

  • The complete logs from all affected mongod nodes
  • Please archive (tar or zip) the $dbPath/diagnostic.data directory

This should give some insight into what may be causing this.

Thanks
Mark

Comment by Ali [ 15/Jan/18 ]

Full explanation of the issue can be seen here:

https://dba.stackexchange.com/questions/195065/why-mongodb-is-so-slow-on-some-queries-while-using-ixscan-as-a-winning-plan

Generated at Thu Feb 08 04:31:01 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.