Uploaded image for project: 'WiredTiger'
  1. WiredTiger
  2. WT-7184

Prevent non-ASCII input in doc files

    • Type: Icon: Bug Bug
    • Resolution: Fixed
    • Priority: Icon: Major - P3 Major - P3
    • WT10.0.0, 4.9.0, 4.4.5
    • Affects Version/s: None
    • Component/s: None
    • Labels:

      The presence of non-ASCII characters in *.dox files causes problems.  For example, I see:

      $ cd dist
      $ sh s_docs -a
      Traceback (most recent call last):
        File "tools/doxfilter.py", line 96, in <module>
        File "/home/ubuntu/mongo/py3/lib/python3.6/encodings/ascii.py", line 26, in decode
          return codecs.ascii_decode(input, self.errors)[0]
      UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 1124: ordinal not in range(128)
      /home/ubuntu/wt/git/wt-7088-shared-storage-extension/src/docs/tool-index.dox:25: warning: unable to resolve reference to `tool-xray' for \ref command 

      This may be dependent on a Python version, but really there shouldn't be any non-ASCII characters in *.dox files.  This command exposes the bad characters:

      $ LC_ALL=C grep -n '[^ -~ ]' src/docs/*.dox | cat -v
      src/docs/backup.dox:124:additional configuration \c incremental=(enabled=true,this_id=?M-^@M-^]ID1?M-^@M-^]).
      src/docs/tool-xray.dox:38:In general the usage is:?M-^@??M-^@?
      src/docs/tool-xray.dox:102:$ llvm-config ?M-^@M-^Sversion 

      (Note that the last space in the bracketed expression is a tab.  A few files have hard tabs, we might debate whether to permit tabs, I don't think it hurts.  We should run the grep command as the first step for building documentation, and fail early if that matches.

            donald.anderson@mongodb.com Donald Anderson
            donald.anderson@mongodb.com Donald Anderson
            0 Vote for this issue
            2 Start watching this issue