Uploaded image for project: 'WiredTiger'
  1. WiredTiger
  2. WT-665

dump format encodes trailing nul bytes for string formats.

    • Type: Icon: Task Task
    • Resolution: Done
    • WT1.6.5
    • Affects Version/s: None
    • Component/s: None
    • Labels:

      @michaelcahill:

      I noticed awhile back that when I dump a file, some lines have encoded trailing nul bytes and some don't. I went back and looked at it today – here's an example:

      pixiebob:format {69} !wt
      wt -h RUNDIR dump file:WiredTiger.wt
      WiredTiger Dump (WiredTiger Version 1.6.5)
      Format=print
      Header
      file:WiredTiger.wt
      allocation_size=4KB,...,type=file,value_format=S
      Data
      colgroup:wt\00                                   <<< trailing nul
      columns=,source="file:wt.wt",type=file\00        <<< trailing nul
      file:wt.wt\00
      

      Anyway, the reason some lines have trailing nul bytes and others don't is the first lines in the dump output are the metadata file rows, and they don't go through the dump cursor's encoding functions, so the trailing nul byte in the string is never encoded.

      I don't think this is a bug, but it's not necessary, either.

      We could do something like this to ignore the trailing nul byte on key/value items with "S" formats:

      diff --git a/src/cursor/cur_dump.c b/src/cursor/cur_dump.c
      index 7883996..61f20eb 100644
      --- a/src/cursor/cur_dump.c
      +++ b/src/cursor/cur_dump.c
      @@ -68,6 +68,10 @@ __curdump_get_key(WT_CURSOR *cursor, ...)
              } else {
                      WT_ERR(child->get_key(child, &item));
       
      +               /* Don't dump the trailing nul byte for string formats. */
      +               if (item.size > 0 && strcmp(cursor->key_format, "S") == 0)
      +                       --item.size;
      +
                      WT_ERR(__raw_to_dump(session, &item,
                          &cursor->key, F_ISSET(cursor, WT_CURSTD_DUMP_HEX) ? 1 : 0));
              }
      @@ -178,6 +182,10 @@ __curdump_get_value(WT_CURSOR *cursor, ...)
       
              WT_ERR(child->get_value(child, &item));
       
      +       /* Don't dump the trailing nul byte for string formats. */
      +       if (strcmp(cursor->key_format, "S") == 0 && item.size > 0)
      +               --item.size;
      +
              WT_ERR(__raw_to_dump(session, &item,
                  &cursor->value, F_ISSET(cursor, WT_CURSTD_DUMP_HEX) ? 1 : 0));
      

      I'm not excited about doing a strcmp() call per key or value, but hopefully the compiler will inline it since it's a two-byte comparison.

      I don't feel strongly about this change, let me know if you want to:

      • just forget about it,
      • implement this change or some other change you prefer,
      • change the metadata dump to encode a trailing nul byte,
      • or something else,

      just let me know your preference.

            Assignee:
            michael.cahill@mongodb.com Michael Cahill (Inactive)
            Reporter:
            keith.bostic@mongodb.com Keith Bostic (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

              Created:
              Updated:
              Resolved: