[SERVER-34713] Progressively declining dropDatabase performance Created: 27/Apr/18  Updated: 29/Oct/23  Resolved: 09/Jun/18

Status: Closed
Project: Core Server
Component/s: Replication, Storage
Affects Version/s: 3.6.3, 3.6.4
Fix Version/s: 3.6.9, 4.0.1, 4.1.1

Type: Bug Priority: Major - P3
Reporter: Bruce Lucas (Inactive) Assignee: Geert Bosch
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File chart.png    
Issue Links:
Backports
Depends
Duplicate
is duplicated by SERVER-40012 Drop collection becomes slow after sy... Closed
Related
related to SERVER-34717 Performance regression in dropDatabase Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v4.0, v3.6
Sprint: Storage NYC 2018-05-21, Storage NYC 2018-06-04, Storage NYC 2018-06-18
Participants:

 Description   

    for (var j = 0; j < 20; j++) {
        for (var i = 0; i < 1000; i++) {
            db["c"+i].insert({})
        }
        t = new Date()
        db.dropDatabase()
        print(new Date() - t)
    }

Time required for dropDatabase progressively increases on each iteration, for example, with a 1-node replica set:

The performance also starts off worse than 3.4.10.

Does not reproduce standalone.



 Comments   
Comment by Eric Milkie [ 02/Nov/18 ]

Sorry for the delay; the code freeze for 3.6.9-rc0 is now scheduled for Monday Nov. 5, so it shouldn't be long afterwards for the production release of 3.6.9.

Comment by Michael [ 01/Nov/18 ]

@ramon.fernandez, I know this has already merged into the 3.6 branch, but October has passed and there was no 3.6.9 release.  Any estimates on when the next 3.6.9 release will happen?

Comment by Githook User [ 24/Sep/18 ]

Author:

{'name': 'Geert Bosch', 'email': 'geert@mongodb.com', 'username': 'GeertBosch'}

Message: SERVER-34713 Change WT size storer to just buffer writes, not cache

(cherry picked from commit c7451c0e11c2a782e9c0dabe16cbad744e4c451a)

Conflicts:
src/mongo/db/storage/wiredtiger/wiredtiger_kv_engine.cpp
src/mongo/db/storage/wiredtiger/wiredtiger_record_store.cpp
src/mongo/db/storage/wiredtiger/wiredtiger_size_storer.cpp

WiredTigerBeginTxnBlock cherry picked from commit 6d2de545a7cfcf4ab23dcf73426a1d50896d6d0c
with modifications.
Branch: v3.6
https://github.com/mongodb/mongo/commit/e9d7ee654b432d694e4070846e5e3c8c84b378d9

Comment by Ramon Fernandez Marina [ 27/Aug/18 ]

Apologies for the late reply mmillerick; the work to back port this fix to 3.6 is currently scheduled and I'd expect it to become available some time in October if there are no unexpected issues, but unfortunately I can't provide a more accurate estimate. Please note that the fix is in 4.0, so you can always consider an upgrade if this issue is problematic for you.

Regards,
Ramón.

Comment by Michael [ 16/Aug/18 ]

Is there an estimate for when this will be patched in 3.6?

Comment by Githook User [ 06/Jul/18 ]

Author:

{'username': 'GeertBosch', 'name': 'Geert Bosch', 'email': 'geert@mongodb.com'}

Message: SERVER-34713 Change WT size storer to just buffer writes, not cache

(cherry picked from commit c7451c0e11c2a782e9c0dabe16cbad744e4c451a)

Conflicts:
src/mongo/db/storage/wiredtiger/wiredtiger_record_store.cpp
Branch: v4.0
https://github.com/mongodb/mongo/commit/4687ff2c133a3d63ed654b8d7875daf014a237bf

Comment by Githook User [ 09/Jun/18 ]

Author:

{'username': 'GeertBosch', 'name': 'Geert Bosch', 'email': 'geert@mongodb.com'}

Message: SERVER-34713 Change WT size storer to just buffer writes, not cache
Branch: master
https://github.com/mongodb/mongo/commit/c7451c0e11c2a782e9c0dabe16cbad744e4c451a

Comment by Bruce Lucas (Inactive) [ 27/Apr/18 ]

I opened SERVER-34717 regarding the performance regression in 3.7.5 relative to 3.6.4, on the assumption that may be a different issue from the progressively worsening performance.

Comment by Judah Schvimer [ 27/Apr/18 ]

I can't think of any replication related changes between 3.6 and 3.7 that would make the first dropDatabase take longer. That sounds like an easy perf workload to add and profile. As for the fact that they keep taking longer, I don't think two phase drop can explain it. dropDatabase waits for all collection drops to replication-commit and then does the physical database drop before returning success. There shouldn't be any data left of the database when it returns and the replication lag should be 0. I wonder if there are any storage resources that aren't getting cleaned up properly? FTDC data might provide some insight there.

Comment by Andy Schwerin [ 27/Apr/18 ]

3.7.5 shows a similar increase (with perhaps a smaller coefficient) but about a 2x or so larger starting value than 3.6.4 (and a good bit more variability) - is this also expected or is a separate ticket for that warranted?

I'm not sure. Perhaps benety.goh or judah.schvimer can answer that question.

Comment by Bruce Lucas (Inactive) [ 27/Apr/18 ]

Does that theory apply to the 1-node replica set used for the chart above?

Comment by Geert Bosch [ 27/Apr/18 ]

This behavior of 3.6 vs 3.4 can be explained by an ever increasing number of collections, because the second phase of the drop cannot keep up with the rate of drops. As the secondary lags a bit, the number of collections can increase. I'll investigate your reproducer and see if this hypothesis holds.

-Geert

Comment by Bruce Lucas (Inactive) [ 27/Apr/18 ]

schwerin, 3.7.5 shows a similar increase (with perhaps a smaller coefficient) but about a 2x or so larger starting value than 3.6.4 (and a good bit more variability) - is this also expected or is a separate ticket for that warranted?

Comment by Andy Schwerin [ 27/Apr/18 ]

The slower initial performance is expected, because of changes made to make repl rollback robust to drop and dropDatabase. (Two phase drop).

Generated at Thu Feb 08 04:37:35 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.