Loading...

XML

Word

Printable

JSON

Type: Task
Resolution: Done
Fix Version/s: WT1.3
Affects Version/s: None
Component/s: None
Labels:
None

Sprint:
None
Story Points:
None

Michael and I ran into an issue today when running test/format on the LSM code.

It turns out that there is an issue when doing a checkpoint while closing a bulk cursor. The issue isn't related to LSM.

I've made some changes to the fop test application that demonstrate the problem. I pushed the changes to a new branch fops-bulk (https://github.com/wiredtiger/wiredtiger/tree/fops-bulk).

If I run fops with:
./t -n 1000 -r 1 -t 2

It regularly hangs. When I capture the state in a debugger, I can see:

Thread 4 (process 6614):
#0  0x00007fff8df83122 in __psynch_mutexwait ()
WT-1  0x00007fff8f23cddd in pthread_mutex_lock ()
WT-2  0x000000010005595c in __wt_spin_lock (session=0x100804c30, t=0x1008044f0) at mutex.i:81
WT-3  0x0000000100055852 in __curbulk_close (cursor=0x101800500) at cur_bulk.c:53
WT-4  0x00000001000013ac in obj_bulk () at file.c:31
WT-5  0x0000000100001ca3 in fop (arg=0x1) at fops.c:134
WT-6  0x00007fff8f237782 in _pthread_start ()
WT-7  0x00007fff8f2241c1 in thread_start ()

Thread 3 (process 6614):
#0  0x00007fff8df8315e in __psynch_rw_rdlock ()
WT-1  0x00007fff8f23d915 in pthread_rwlock_rdlock ()
WT-2  0x0000000100067a43 in __wt_readlock (session=0x100804e48, rwlock=0x100600500) at os_mtx.c:176
WT-3  0x0000000100051a41 in __conn_btree_open_lock (session=0x100804e48, flags=0) at conn_btree.c:36
WT-4  0x0000000100051c8d in __conn_btree_get (session=0x100804e48, name=0x1018002f0 "file:__wt", ckpt=0x0, flags=0) at conn_btree.c:106
WT-5  0x000000010005249d in __wt_conn_btree_get (session=0x100804e48, name=0x1018002f0 "file:__wt", ckpt=0x0, cfg=0x0, flags=0) at conn_btree.c:254
WT-6  0x000000010007e507 in __wt_session_get_btree (session=0x100804e48, uri=0x1018002f0 "file:__wt", checkpoint=0x0, cfg=0x0, flags=0) at session_btree.c:244
WT-7  0x00000001000624c6 in __wt_meta_btree_apply (session=0x100804e48, func=0x100086c30 <__wt_checkpoint>, cfg=0x100480e48, flags=0) at meta_apply.c:37
WT-8  0x0000000100086673 in __wt_txn_checkpoint (session=0x100804e48, cfg=0x100480e48) at txn_ckpt.c:100
WT-9  0x000000010007d76b in __session_checkpoint (wt_session=0x100804e48, config=0x100087c12 "name=fops") at session_api.c:509
WT-10 0x000000010000169e in obj_checkpoint () at file.c:84
WT-11 0x0000000100001c5d in fop (arg=0x0) at fops.c:122
WT-12 0x00007fff8f237782 in _pthread_start ()
WT-13 0x00007fff8f2241c1 in thread_start ()

The bulk close is attempting to get the schema lock while holding the handle lock. The checkpoint is attempting to get the handle lock while holding the schema lock.

I'm wondering if checkpoint should skip files that are being used for bulk load. Do you think that is a reasonable approach? I guess it would skip creating an empty file in a checkpoint if the open happened before a checkpoint started and the bulk cursor was opened after.

related to

WT-1 placeholder WT-1

Closed

WT-2 What does metadata look like?

Closed

WT-3 What file formats are required?

Closed

WT-4 Flexible cursor traversals

Closed

WT-5 How does pget work: is it necessary?

Closed

WT-6 Complex schema example

Closed

WT-7 Do we need the handle->err/errx methods?

Closed

WT-8 Do we need table load, bulk-load and/or dump methods?

Closed

WT-9 Does adding schema need to be transactional?

Closed

WT-10 Basic "getting started" tutorial

Closed

WT-11 placeholder #11

Closed

WT-12 Write more examples

Closed

WT-13 Define supported platforms

Closed

(8 related to)

Assignee:: Keith Bostic (Inactive)
Reporter:: Alexander Gorrod
Votes:: 0 Vote for this issue
Watchers:: 1 Start watching this issue

Created:: Sep 07 2012 06:18:13 AM UTC
Updated:: Apr 16 2015 06:41:56 PM UTC
Resolved:: Apr 09 2015 01:06:40 AM UTC

Details

Description

Attachments

Issue Links

Forms

Activity

People

Dates