[SERVER-84108] Unanticipated WiredTiger History Store File Growth Observed During Failover Created: 12/Dec/23  Updated: 07/Feb/24

Status: Needs Verification
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: venkataramans rama Assignee: Backlog - Triage Team
Resolution: Unresolved Votes: 7
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File Usage.png    
Assigned Teams:
Server Triage
Operating System: ALL
Participants:

 Description   

Problem Statement:

 

We are currently undergoing a migration from MongoDB version 4.4 to 5.0, and during this process, we have encountered some intriguing observations that deviate from our anticipated outcomes. Despite consulting various forums, existing MongoDB JIRA tickets, and documentation, we have been unable to pinpoint concrete reasons for these unexpected findings.

We have 5 member replicaSet , 4 data bearing and 1 is ARBITER. 2 members on each data centers.

  1. defaultWriteConcern configured as w:1, timeout:0
  2. minSnapshotHistoryWindowInSeconds set 1sec
  3. enableFlowControl set to False

 

members when one of the clusters goes down, we see WiredTigetHS.wt starts growing which is stored in  the dbPath . Even though we set the minSnapshotHistoryWindowInSeconds   to 1s, the file growing every second. Once it touches the oplog size, both oplog and this file grows simultaneously and filling up the disk fast. It depends on the work load, but in our case, it fills quickly in 5-10 mins

 

Once the majority members are UP, we see that WiredTigerHS.wt back to normal (less than 5MB).

 

We have the following questions.

 

  1. Why doesn't WiredTiger History Store (WT HS) adhere to the minSnapshotHistoryWindowInSeconds configuration during failover or when a majority is not available?
  2. What is causing the continuous growth, and is there any specific configuration that might help control this expansion?
  3. Is there a missing configuration setting to manage this growth, . We tried  setting minSnapshotHistoryWindowInSeconds to 0 but no change.
  4. Why does the oplog simultaneously expand when WiredTigerHS.wt reaches the exact size as the oplog file?
  5. Particularly, since our application doesn't utilize snapshot reads, what is the procedure for completely disabling the WiredTiger History Store?
  6. Even when the cluster-wide defaultWriteConcern is set to w:1 (not a majority), why does MongoDB still aim to maintain the "Majority Commit Point" and allow the oplog to grow?

 

Out of 5 members, 2 members still can go down as per the mongoDB fault tolerance  and we expect no impact to the running primary and secondary. But starting from 5.0 due to the HS and oplog growth, we see mongoDB cant tolerate when majority is not available. Note that we set the defaultWriteConcern  to 1 (Default is changed to majority starting from 5.0).

 

Before Failover

=========sessionmgr09==============

{

        "ns" : "local.oplog.rs",

        "size" : NumberLong("5348387447"),

        "count" : 1280626,

        "avgObjSize" : 4176,

        "storageSize" : NumberLong("6390153216"),

        "freeStorageSize" : 1371807744,

        "capped" : true,

        "max" : 0,

        "maxSize" : NumberLong("5368709120"),

total 9.7G

rw------ 1 root root 6.0G Dec 11 20:49 collection-18-7685412961092358367.wt

rw------ 1 root root 879M Dec 11 20:48 collection-74-7685412961092358367.wt

rw------ 1 root root 878M Dec 11 20:48 collection-98-7685412961092358367.wt

rw------ 1 root root 876M Dec 11 20:48 collection-90-7685412961092358367.wt

tmpfs            95G   36G   59G  38% /var/data/sessions.1

=========sessionmgr10==============

{

        "ns" : "local.oplog.rs",

        "size" : NumberLong("5360045160"),

        "count" : 1283416,

        "avgObjSize" : 4176,

        "storageSize" : NumberLong("5466677248"),

        "freeStorageSize" : 488841216,

        "capped" : true,

        "max" : 0,

        "maxSize" : NumberLong("5368709120"),

total 8.8G

rw------ 1 root root 5.1G Dec 11 20:48 collection-18--7079402678170407987.wt

rw------ 1 root root 876M Dec 11 20:49 collection-98--7079402678170407987.wt

rw------ 1 root root 875M Dec 11 20:49 collection-82--7079402678170407987.wt

rw------ 1 root root 874M Dec 11 20:49 collection-74--7079402678170407987.wt

tmpfs            95G   36G   59G  38% /var/data/sessions.1

 

 

Once the 2 members were brought down

 

=========sessionmgr09==============

{

        "ns" : "local.oplog.rs",

        "size" : NumberLong("5362478203"),

        "count" : 1284154,

        "avgObjSize" : 4175,

        "storageSize" : NumberLong("6390153216"),

        "freeStorageSize" : 1267474432,

        "capped" : true,

        "max" : 0,

        "maxSize" : NumberLong("5368709120"),

total 11G

rw------ 1 root root 6.0G Dec 11 20:53 collection-18-7685412961092358367.wt

rw------ 1 root root 882M Dec 11 20:53 WiredTigerHS.wt

rw------ 1 root root 879M Dec 11 20:53 collection-74-7685412961092358367.wt

rw------ 1 root root 878M Dec 11 20:53 collection-98-7685412961092358367.wt

tmpfs            95G   37G   58G  40% /var/data/sessions.1

=========sessionmgr10==============

{

        "ns" : "local.oplog.rs",

        "size" : NumberLong("5319883881"),

        "count" : 1273920,

        "avgObjSize" : 4175,

        "storageSize" : NumberLong("5466677248"),

        "freeStorageSize" : 307101696,

        "capped" : true,

        "max" : 0,

        "maxSize" : NumberLong("5368709120"),

total 9.7G

rw------ 1 root root 5.1G Dec 11 20:53 collection-18--7079402678170407987.wt

rw------ 1 root root 876M Dec 11 20:53 collection-98--7079402678170407987.wt

rw------ 1 root root 875M Dec 11 20:53 WiredTigerHS.wt

rw------ 1 root root 875M Dec 11 20:53 collection-82--7079402678170407987.wt

tmpfs            95G   37G   58G  39% /var/data/sessions.1

 

Within 15 mins the size grows in GB. Note that the oplog still remains same.

 

=========sessionmgr09==============

{

        "ns" : "local.oplog.rs",

        "size" : NumberLong("5809646773"),

        "count" : 1391589,

        "avgObjSize" : 4174,

        "storageSize" : NumberLong("6390140928"),

        "freeStorageSize" : 1050791936,

        "capped" : true,

        "max" : 0,

        "maxSize" : NumberLong("5368709120"),

total 15G

rw------ 1 root root 6.0G Dec 11 21:05 collection-18-7685412961092358367.wt

rw------ 1 root root 5.2G Dec 11 21:05 WiredTigerHS.wt

rw------ 1 root root 879M Dec 11 21:05 collection-74-7685412961092358367.wt

rw------ 1 root root 878M Dec 11 21:05 collection-98-7685412961092358367.wt

tmpfs            95G   42G   53G  45% /var/data/sessions.1

=========sessionmgr10==============

{

        "ns" : "local.oplog.rs",

        "size" : NumberLong("5815976272"),

        "count" : 1393115,

        "avgObjSize" : 4174,

        "storageSize" : NumberLong("5466664960"),

        "freeStorageSize" : 70377472,

        "capped" : true,

        "max" : 0,

        "maxSize" : NumberLong("5368709120"),

total 14G

rw------ 1 root root 5.2G Dec 11 21:05 WiredTigerHS.wt

rw------ 1 root root 5.1G Dec 11 21:05 collection-18--7079402678170407987.wt

rw------ 1 root root 876M Dec 11 21:05 collection-98--7079402678170407987.wt

rw------ 1 root root 875M Dec 11 21:05 collection-82--7079402678170407987.wt

tmpfs            95G   41G   54G  44% /var/data/sessions.1

 

 

 

Once the WTHS.wt reaches oplog size, oplog size also increasing on sessionmgr10.

 

 

=========sessionmgr09==============

{

        "ns" : "local.oplog.rs",

        "size" : NumberLong("6104465326"),

        "count" : 1462085,

        "avgObjSize" : 4175,

        "storageSize" : NumberLong("6390140928"),

        "freeStorageSize" : 876965888,

        "capped" : true,

        "max" : 0,

        "maxSize" : NumberLong("5368709120"),

total 16G

rw------ 1 root root 6.0G Dec 11 21:05 collection-18-7685412961092358367.wt

rw------ 1 root root 5.6G Dec 11 21:05 WiredTigerHS.wt

rw------ 1 root root 879M Dec 11 21:05 collection-74-7685412961092358367.wt

rw------ 1 root root 878M Dec 11 21:05 collection-98-7685412961092358367.wt

tmpfs            95G   42G   53G  45% /var/data/sessions.1

=========sessionmgr10==============

{

        "ns" : "local.oplog.rs",

        "size" : NumberLong("6111182492"),

        "count" : 1463674,

        "avgObjSize" : 4175,

        "storageSize" : NumberLong("5601685504"),

        "freeStorageSize" : 17518592,

        "capped" : true,

        "max" : 0,

        "maxSize" : NumberLong("5368709120"),

total 15G

rw------ 1 root root 5.5G Dec 11 21:05 WiredTigerHS.wt

rw------ 1 root root 5.3G Dec 11 21:05 collection-18--7079402678170407987.wt

rw------ 1 root root 876M Dec 11 21:05 collection-98--7079402678170407987.wt

rw------ 1 root root 875M Dec 11 21:05 collection-82--7079402678170407987.wt

tmpfs            95G   41G   54G  44% /var/data/sessions.1

 

 

Now the sessionmgr09's oplog also start growing as WTHS.wt reaches the oplog size

 

=========sessionmgr09==============

{

        "ns" : "local.oplog.rs",

        "size" : NumberLong("6942499375"),

        "count" : 1662602,

        "avgObjSize" : 4175,

        "storageSize" : NumberLong("6490021888"),

        "freeStorageSize" : 16769024,

        "capped" : true,

        "max" : 0,

        "maxSize" : NumberLong("5368709120"),

total 17G

rw------ 1 root root 6.3G Dec 11 21:07 WiredTigerHS.wt

rw------ 1 root root 6.1G Dec 11 21:07 collection-18-7685412961092358367.wt

rw------ 1 root root 879M Dec 11 21:07 collection-74-7685412961092358367.wt

rw------ 1 root root 878M Dec 11 21:07 collection-98-7685412961092358367.wt

tmpfs            95G   43G   52G  45% /var/data/sessions.1

=========sessionmgr10==============

{

        "ns" : "local.oplog.rs",

        "size" : NumberLong("6948652674"),

        "count" : 1664076,

        "avgObjSize" : 4175,

        "storageSize" : NumberLong("6466686976"),

        "freeStorageSize" : 14090240,

        "capped" : true,

        "max" : 0,

        "maxSize" : NumberLong("5368709120"),

total 16G

rw------ 1 root root 6.2G Dec 11 21:07 WiredTigerHS.wt

rw------ 1 root root 6.1G Dec 11 21:07 collection-18--7079402678170407987.wt

rw------ 1 root root 876M Dec 11 21:07 collection-98--7079402678170407987.wt

rw------ 1 root root 875M Dec 11 21:07 collection-82--7079402678170407987.wt

tmpfs            95G   43G   53G  45% /var/data/sessions.1

 

 

Within few mins

 

=========sessionmgr09===Mon Dec 11 21:16:42 UTC 2023===========

{

        "ns" : "local.oplog.rs",

        "size" : NumberLong("5315937817"),

        "count" : 1274037,

        "avgObjSize" : 4172,

        "storageSize" : NumberLong("9466707968"),

        "freeStorageSize" : NumberLong("4267544576"),

        "capped" : true,

        "max" : 0,

        "maxSize" : NumberLong("5368709120"),

total 21G

rw------ 1 root root 8.9G Dec 11 21:16 collection-18-7685412961092358367.wt

rw------ 1 root root 8.0G Dec 11 21:15 WiredTigerHS.wt

rw------ 1 root root 879M Dec 11 21:16 collection-74-7685412961092358367.wt

rw------ 1 root root 878M Dec 11 21:16 collection-98-7685412961092358367.wt

tmpfs            95G   47G   48G  50% /var/data/sessions.1

=========sessionmgr10===Mon Dec 11 21:16:43 UTC 2023===========

{

        "ns" : "local.oplog.rs",

        "size" : NumberLong("5320907873"),

        "count" : 1275216,

        "avgObjSize" : 4172,

        "storageSize" : NumberLong("9433522176"),

        "freeStorageSize" : NumberLong("4194209792"),

        "capped" : true,

        "max" : 0,

        "maxSize" : NumberLong("5368709120"),

total 21G

rw------ 1 root root 8.8G Dec 11 21:16 collection-18--7079402678170407987.wt

rw------ 1 root root 8.4G Dec 11 21:15 WiredTigerHS.wt

rw------ 1 root root 876M Dec 11 21:16 collection-98--7079402678170407987.wt

rw------ 1 root root 875M Dec 11 21:15 collection-82--7079402678170407987.wt

tmpfs            95G   48G   47G  51% /var/data/sessions.1

 

 

 

Brought back the down members, on the secondary sm10, the WTHS.wt reduced to 4MB. After Sometime the sm09 (PRIMARY) WTHS.wt also reduced to 4MB.

 

=========sessionmgr09===Mon Dec 11 21:22:16 UTC 2023===========

{

        "ns" : "local.oplog.rs",

        "size" : NumberLong("5366640045"),

        "count" : 1286026,

        "avgObjSize" : 4173,

        "storageSize" : NumberLong("9466695680"),

        "freeStorageSize" : NumberLong("4469882880"),

        "capped" : true,

        "max" : 0,

        "maxSize" : NumberLong("5368709120"),

total 21G

rw------ 1 root root 8.9G Dec 11 21:21 collection-18-7685412961092358367.wt

rw------ 1 root root 8.0G Dec 11 21:21 WiredTigerHS.wt

rw------ 1 root root 879M Dec 11 21:21 collection-74-7685412961092358367.wt

rw------ 1 root root 878M Dec 11 21:22 collection-98-7685412961092358367.wt

tmpfs            95G   47G   48G  50% /var/data/sessions.1

=========sessionmgr10===Mon Dec 11 21:22:17 UTC 2023===========

{

        "ns" : "local.oplog.rs",

        "size" : NumberLong("5367983767"),

        "count" : 1286360,

        "avgObjSize" : 4173,

        "storageSize" : NumberLong("9433522176"),

        "freeStorageSize" : NumberLong("4348190720"),

        "capped" : true,

        "max" : 0,

        "maxSize" : NumberLong("5368709120"),

total 13G

rw------ 1 root root 8.8G Dec 11 21:21 collection-18--7079402678170407987.wt

rw------ 1 root root 876M Dec 11 21:22 collection-98--7079402678170407987.wt

rw------ 1 root root 875M Dec 11 21:21 collection-82--7079402678170407987.wt

rw------ 1 root root 874M Dec 11 21:21 collection-74--7079402678170407987.wt

tmpfs            95G   39G   56G  42% /var/data/sessions.1

 

 

 

Runtime parameters

/usr/bin/mongod --ipv6 --slowms 500 --storageEngine wiredTiger --wiredTigerCacheSizeGB 12 --setParameter oplogFetcherUsesExhaust=false --setParameter wiredTigerConcurrentReadTransactions=16 --setParameter wiredTigerConcurrentWriteTransactions=16 --setParameter maxIndexBuildMemoryUsageMegabytes=500 --bind_ip_all --keyFile /root/.dbkey --port 27717 --dbpath=/var/data/sessions.1/e --replSet set01e --fork --pidfilepath /var/run/sessionmgr-27717.pid --oplogSize 5120 --logpath /var/log/mongodb-27717.log --logappend --quiet



 Comments   
Comment by venkataramans rama [ 07/Feb/24 ]

Hi Team,

 

I Believe this issue happens because mongo checks the window by using the stableTimestamp only. Remember incase of Majority of the members not available, the stableTimestamp/MajorityCommitted time doesnt move.  Not sure if the logic to find the calculateHistoryLagFromStableTimestamp should also consider the current Timestamp instead of the  stableTimestamp. I appreciate your response on this long pending issue.

Comment by venkataramans rama [ 24/Jan/24 ]

Hi Team,

 

Requesting your response on this please. Its been more than a month waiting for your initial response.

Comment by venkataramans rama [ 16/Jan/24 ]

Hi Team,

Looking for your initial analysis as its blocking our migration to Mongo 5.0. Appreciate your reply on this.

Comment by venkataramans rama [ 20/Dec/23 ]

Team, Do you have any update on this please

Comment by venkataramans rama [ 12/Dec/23 ]

the repro is very simple

Have a PSA with 5 members (3 members also do)

run the workload

stop 2 secondaries

observe WiredTigerHS.wt and oplog increase.

Generated at Thu Feb 08 06:54:04 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.