[SERVER-29308] CLONE - Journal files accumulating Created: 20/May/17  Updated: 21/Jun/17  Resolved: 22/May/17

Status: Closed
Project: Core Server
Component/s: WiredTiger
Affects Version/s: 3.4.4
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Alberto Ornaghi Assignee: Kelsey Schubert
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
duplicates SERVER-29230 Journal files accumulating on Replica... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Participants:

 Description   

we have a replica set composed by a primary, a secondary and an arbiter. we deployed it with automation-agent from mongodb cloud manager.
everything is ok on the primary but we noticed that the free disk space on the secondary in constantly decreasing, checking the filesystem we discovered that the journal directory is the culprit.
we performed a full resync of the secondary, but after 24h the journal directory occupies 380Gb (we have 485Gb of data). see attachment of the free disk space graph over 24h (primary on the left, secondary on the right).

currently there are 3926 journal files in the journal directory. (see attachment)
the primary is not accumulating journal files and behaves as expected.



 Comments   
Comment by Kelsey Schubert [ 22/May/17 ]

Hi alberto.ornaghi@gmail.com,

Thanks for confirming on SERVER-29230 that you were able to resolve the time changing issue. We're continuing to work on WT-3327. Since this is the same issue as SERVER-29230, I'm resolving it as duplicate. Please comment there if you encounter this issue again.

Thank you,
Thomas

Comment by Alberto Ornaghi [ 20/May/17 ]

is there anything i and to the trigger the checkpoint?

Comment by Alberto Ornaghi [ 20/May/17 ]

i've cloned the issue SERVER-29230 because i cannot find a way to reopen it...

the issue now is present on both the primary and the secondary!
(see my comment a the end of the 29230) copied here:

I created an index on the primary and then it replicated to the secondary.
i've noticed that the new index is 4 Gb but the total index size grew from 105 Gb to 144 Gb. very strage, i've checked the journal and now i have the journal stuck on both the primary and the secondary (234 Gb on primary and 193 on secondary). how can i resolve this critical situation? they will fill up the free disk space in few days.
is it safe to kill on of them and let them redo the journal? is there some sort of debug output to see in the log when it re-do the journal on startup to check if it's stucked or not?
will the bug prevent the journal to be re-done on startup?
please help, thank you.

Generated at Thu Feb 08 04:20:28 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.