@michaelcahill:
I noticed awhile back that when I dump a file, some lines have encoded trailing nul bytes and some don't. I went back and looked at it today – here's an example:
pixiebob:format {69} !wt
wt -h RUNDIR dump file:WiredTiger.wt
WiredTiger Dump (WiredTiger Version 1.6.5)
Format=print
Header
file:WiredTiger.wt
allocation_size=4KB,...,type=file,value_format=S
Data
colgroup:wt\00 <<< trailing nul
columns=,source="file:wt.wt",type=file\00 <<< trailing nul
file:wt.wt\00
Anyway, the reason some lines have trailing nul bytes and others don't is the first lines in the dump output are the metadata file rows, and they don't go through the dump cursor's encoding functions, so the trailing nul byte in the string is never encoded.
I don't think this is a bug, but it's not necessary, either.
We could do something like this to ignore the trailing nul byte on key/value items with "S" formats:
diff --git a/src/cursor/cur_dump.c b/src/cursor/cur_dump.c index 7883996..61f20eb 100644 --- a/src/cursor/cur_dump.c +++ b/src/cursor/cur_dump.c @@ -68,6 +68,10 @@ __curdump_get_key(WT_CURSOR *cursor, ...) } else { WT_ERR(child->get_key(child, &item)); + /* Don't dump the trailing nul byte for string formats. */ + if (item.size > 0 && strcmp(cursor->key_format, "S") == 0) + --item.size; + WT_ERR(__raw_to_dump(session, &item, &cursor->key, F_ISSET(cursor, WT_CURSTD_DUMP_HEX) ? 1 : 0)); } @@ -178,6 +182,10 @@ __curdump_get_value(WT_CURSOR *cursor, ...) WT_ERR(child->get_value(child, &item)); + /* Don't dump the trailing nul byte for string formats. */ + if (strcmp(cursor->key_format, "S") == 0 && item.size > 0) + --item.size; + WT_ERR(__raw_to_dump(session, &item, &cursor->value, F_ISSET(cursor, WT_CURSTD_DUMP_HEX) ? 1 : 0));
I'm not excited about doing a strcmp() call per key or value, but hopefully the compiler will inline it since it's a two-byte comparison.
I don't feel strongly about this change, let me know if you want to:
- just forget about it,
- implement this change or some other change you prefer,
- change the metadata dump to encode a trailing nul byte,
- or something else,
just let me know your preference.