[SERVER-16146] dropDatabase is slow and returns success immediately though it then blocks progress while running Created: 14/Nov/14  Updated: 11/Jul/16  Resolved: 22/Nov/14

Status: Closed
Project: Core Server
Component/s: Storage
Affects Version/s: None
Fix Version/s: 2.8.0-rc1

Type: Bug Priority: Major - P3
Reporter: Asya Kamsky Assignee: Asya Kamsky
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
duplicates SERVER-8622 serverStatus blocks waiting for vario... Closed
Operating System: ALL
Participants:

 Description   

I dropped DB and created LSM collection like this:

 

It took over 400 seconds and it was blocking everything, including serverStatus.

db.createCollection("tinylsm",{storageEngine:{ wiredtiger:{configString: "type=lsm,lsm=(chunk_size=10MB)"}}})

2014-11-14T02:28:00.825-0500 I CONTROL  [initandlisten] MongoDB starting : pid=2253 port=27077 dbpath=/mnt/data0/asya/wtStandalone/ 64-bit host=erh-twinb
2014-11-14T02:28:00.825-0500 I CONTROL  [initandlisten] db version v2.8.0-rc1-pre-
2014-11-14T02:28:00.825-0500 I CONTROL  [initandlisten] git version: 81bb3e94b06d85f9aac2b014040b81ca7fc5c5bb
2014-11-14T02:28:00.825-0500 I CONTROL  [initandlisten] build info: Linux erh-twinb 3.5.0-54-generic #81~precise1-Ubuntu SMP Tue Jul 15 04:02:22 UTC 2014 x86_64 BOOST_LIB_VERSION=1_49
2014-11-14T02:28:00.825-0500 I CONTROL  [initandlisten] allocator: tcmalloc
2014-11-14T02:28:00.825-0500 I CONTROL  [initandlisten] options: { net: { port: 27077 }, processManagement: { fork: true }, storage: { dbPath: "/mnt/data0/asya/wtStandalone/", engine: "wiredtig
er", wiredtiger: { engineConfig: "cache_size=10G,checkpoint=(wait=10)" } }, systemLog: { destination: "file", path: "/mnt/data1/asya/wt91.log" } }
2014-11-14T02:28:00.825-0500 I STORAGE  [initandlisten] wiredtiger_open config: create,cache_size=62G,session_max=20000,extensions=[local=(entry=index_collator_extension)],statistics=(all),log=
(enabled=true,archive=true,path=journal),checkpoint=(wait=60,log_size=2GB),cache_size=10G,checkpoint=(wait=10)
2014-11-14T02:28:01.068-0500 I NETWORK  [initandlisten] waiting for connections on port 27077
2014-11-14T02:28:47.515-0500 I NETWORK  [initandlisten] connection accepted from 127.0.0.1:39360 #1 (1 connection now open)
2014-11-14T02:28:47.516-0500 I COMMANDS [conn1] dropDatabase ycsb starting
2014-11-14T02:28:47.687-0500 I COMMANDS [conn1] dropDatabase ycsb finished
2014-11-14T02:28:47.687-0500 I QUERY    [conn1] command ycsb.$cmd command: dropDatabase { dropDatabase: 1.0 } keyUpdates:0  reslen:55 170ms
2014-11-14T02:28:47.723-0500 I NETWORK  [initandlisten] connection accepted from 127.0.0.1:39361 #2 (2 connections now open)
2014-11-14T02:31:18.993-0500 I NETWORK  [initandlisten] connection accepted from 127.0.0.1:39362 #3 (3 connections now open)
2014-11-14T02:31:27.895-0500 I NETWORK  [initandlisten] connection accepted from 127.0.0.1:39363 #4 (4 connections now open)
2014-11-14T02:35:38.656-0500 I NETWORK  [conn1] end connection 127.0.0.1:39360 (3 connections now open)
2014-11-14T02:35:38.657-0500 I COMMANDS [conn3] serverStatus was very slow: { after basic: 0, after asserts: 0, after backgroundFlushing: 0, after connections: 0, after cursors: 0, after dur: 0
, after extra_info: 0, after globalLock: 0, after network: 0, after opcounters: 0, after opcountersRepl: 0, after storageEngine: 0, after wiredtiger: 258136, at end: 258136 }
2014-11-14T02:35:38.657-0500 I COMMANDS [conn2] serverStatus was very slow: { after basic: 0, after asserts: 0, after backgroundFlushing: 0, after connections: 0, after cursors: 0, after dur: 0
, after extra_info: 0, after globalLock: 0, after network: 0, after opcounters: 0, after opcountersRepl: 0, after storageEngine: 0, after wiredtiger: 408262, at end: 408262 }
2014-11-14T02:35:38.657-0500 I QUERY    [conn3] command admin.$cmd command: serverStatus { serverStatus: 1 } ntoreturn:1 keyUpdates:0  reslen:13949 259663ms
2014-11-14T02:35:38.657-0500 I QUERY    [conn2] command admin.$cmd command: serverStatus { serverStatus: 1.0 } keyUpdates:0  reslen:13949 410933ms
2014-11-14T02:35:38.657-0500 I NETWORK  [conn3] end connection 127.0.0.1:39362 (2 connections now open)
2014-11-14T02:35:38.663-0500 I NETWORK  [conn2] end connection 127.0.0.1:39361 (1 connection now open)
2014-11-14T02:35:38.698-0500 I NETWORK  [initandlisten] connection accepted from 127.0.0.1:39364 #5 (2 connections now open)
2014-11-14T02:35:38.704-0500 I NETWORK  [conn5] end connection 127.0.0.1:39364 (1 connection now open)
2014-11-14T02:35:38.737-0500 I NETWORK  [initandlisten] connection accepted from 127.0.0.1:39365 #6 (2 connections now open)
2014-11-14T02:35:38.873-0500 I QUERY    [conn6] command ycsb.$cmd command: create { create: "tinylsm", storageEngine: { wiredtiger: { configString: "type=lsm,lsm=(chunk_size=10MB)" } } } keyUpd
ates:0  reslen:37 134ms
2014-11-14T02:35:38.874-0500 I NETWORK  [conn6] end connection 127.0.0.1:39365 (1 connection now open)

The logs seem to indicate gap between the calls, but the files in DB directory show otherwise:

-rw-rw-r-- 1 asya asya       4096 Nov 14 02:26 collection-0--7409934771532628199-000688.bf
-rw-rw-r-- 1 asya asya       4096 Nov 14 02:26 collection-0--7409934771532628199-000590.bf
-rw-rw-r-- 1 asya asya       4096 Nov 14 02:26 collection-0--7409934771532628199-000499.bf
-rw-rw-r-- 1 asya asya      36864 Nov 14 02:28 index-1-4425197065334213246.wt
-rw-rw-r-- 1 asya asya      36864 Nov 14 *02:28* collection-0-4425197065334213246.wt
-rw-rw-r-- 1 asya asya      36864 Nov 14 *02:35* _mdb_catalog.wt
-rw-rw-r-- 1 asya asya   82001920 Nov 14 *02:35* collection-0-2717636346714078045-000081.lsm
-rw-rw-r-- 1 asya asya     106496 Nov 14 02:35 collection-0-2717636346714078045-000081.bf
-rw-rw-r-- 1 asya asya   85880832 Nov 14 02:35 collection-0-2717636346714078045-000088.lsm



 Comments   
Comment by Asya Kamsky [ 22/Nov/14 ]

Retested with master - the dropDatabase is now synchronous and takes much less time than before.

Comment by Eliot Horowitz (Inactive) [ 20/Nov/14 ]

Can you retest with HEAD

Comment by Eliot Horowitz (Inactive) [ 17/Nov/14 ]

Added SERVER-16183 to çlean on startup.

Comment by Eliot Horowitz (Inactive) [ 17/Nov/14 ]

This is likely the same issue as the checkpoint issue. What version were you using on this?

Comment by Asya Kamsky [ 15/Nov/14 ]

I think I figured out what's going on.

Between dropDatabase which is, in fact, large and create collection is when the long pause is. dropDatabase returns immediately but the files in the data directory are still there, in fact they are there for a long time. Watching the process and directory after ok is returned for dropDatabase, it's clear that this is when it's cleaning things up...

I think a way to reproduce would be to run the same workload/test on dbpath without deleting the files but instead keeping the directory and just dropping the one database you're testing on.

I'm watching files being deleted from it at a rate of about a few every minute - either it's going to take even longer than before (because I keep running on the same dbpath) or it speeds up towards the end, since it's been running for 15 minutes already.

Comment by Eliot Horowitz (Inactive) [ 14/Nov/14 ]

This is probably the same as the checkpoint issue.

Generated at Thu Feb 08 03:40:06 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.