Description
The presence of non-ASCII characters in *.dox files causes problems. For example, I see:
$ cd dist
|
$ sh s_docs -a
|
Traceback (most recent call last):
|
File "tools/doxfilter.py", line 96, in <module>
|
sys.stdout.write(process(infile.read()))
|
File "/home/ubuntu/mongo/py3/lib/python3.6/encodings/ascii.py", line 26, in decode
|
return codecs.ascii_decode(input, self.errors)[0]
|
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 1124: ordinal not in range(128)
|
/home/ubuntu/wt/git/wt-7088-shared-storage-extension/src/docs/tool-index.dox:25: warning: unable to resolve reference to `tool-xray' for \ref command
|
This may be dependent on a Python version, but really there shouldn't be any non-ASCII characters in *.dox files. This command exposes the bad characters:
$ LC_ALL=C grep -n '[^ -~ ]' src/docs/*.dox | cat -v
|
src/docs/backup.dox:124:additional configuration \c incremental=(enabled=true,this_id=?M-^@M-^]ID1?M-^@M-^]).
|
src/docs/tool-xray.dox:38:In general the usage is:?M-^@??M-^@?
|
src/docs/tool-xray.dox:102:$ llvm-config ?M-^@M-^Sversion
|
(Note that the last space in the bracketed expression is a tab. A few files have hard tabs, we might debate whether to permit tabs, I don't think it hurts. We should run the grep command as the first step for building documentation, and fail early if that matches.