The presence of non-ASCII characters in *.dox files causes problems. For example, I see:
$ cd dist $ sh s_docs -a Traceback (most recent call last): File "tools/doxfilter.py", line 96, in <module> sys.stdout.write(process(infile.read())) File "/home/ubuntu/mongo/py3/lib/python3.6/encodings/ascii.py", line 26, in decode return codecs.ascii_decode(input, self.errors)[0] UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 1124: ordinal not in range(128) /home/ubuntu/wt/git/wt-7088-shared-storage-extension/src/docs/tool-index.dox:25: warning: unable to resolve reference to `tool-xray' for \ref command
This may be dependent on a Python version, but really there shouldn't be any non-ASCII characters in *.dox files. This command exposes the bad characters:
$ LC_ALL=C grep -n '[^ -~ ]' src/docs/*.dox | cat -v src/docs/backup.dox:124:additional configuration \c incremental=(enabled=true,this_id=?M-^@M-^]ID1?M-^@M-^]). src/docs/tool-xray.dox:38:In general the usage is:?M-^@??M-^@? src/docs/tool-xray.dox:102:$ llvm-config ?M-^@M-^Sversion
(Note that the last space in the bracketed expression is a tab. A few files have hard tabs, we might debate whether to permit tabs, I don't think it hurts. We should run the grep command as the first step for building documentation, and fail early if that matches.