[SERVER-46406] Evaluate JSON logging performance in test suites Created: 25/Feb/20 Updated: 29/Oct/23 Resolved: 13/Mar/20 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Logging |
| Affects Version/s: | None |
| Fix Version/s: | 4.4.0-rc0, 4.7.0 |
| Type: | Task | Priority: | Major - P3 |
| Reporter: | Mark Benvenuto | Assignee: | Henrik Edin |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||||||
| Issue Links: |
|
||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||
| Backport Requested: |
v4.4
|
||||||||||||
| Sprint: | Security 2020-03-09, Security 2020-03-23 | ||||||||||||
| Participants: | |||||||||||||
| Description |
|
Use evergreen data to compare logging performance and evaluate any opportunities for performance improvements. One suggestion has been to disable UTF-8 validation in logging. Other possibilities include examining memory and I/O overhead. |
| Comments |
| Comment by Githook User [ 16/Mar/20 ] | |||||||||||||||||||||||||
|
Author: {'username': 'henrikedin', 'name': 'Henrik Edin', 'email': 'henrik.edin@mongodb.com'}Message: Use fmt::compile and fmt::format_int when possible. (cherry picked from commit 5094b4f2f2ee9de07dc4c3d3c320e967595496f1) | |||||||||||||||||||||||||
| Comment by Henrik Edin [ 13/Mar/20 ] | |||||||||||||||||||||||||
|
Looked into the difference in run time by comparing the sharding suite in these two runs: Execution subtasks went from 26 to 28 Before there was 972 jobs that had a total runtime of 3h 37m 15s The number of jobs increased, the cumulative time increased with around 20% but the wall clock time with a whopping 100%. When looking at the logs from the various subtasks you notice: Before the gen task finished 21:33 and the exec tasks started around 21:53 and ran for up to 10min So from what I can tell the problem is a scheduling / resource issue at Evergreen. There's not enough hosts to schedule the execution, leading to long wall clock times. When they do eventually run the execution time is not much different and even if there was an increased total execution time there are also more tests that run now than before. I will close this ticket with the CPU perf improvements that has been committed. | |||||||||||||||||||||||||
| Comment by Githook User [ 13/Mar/20 ] | |||||||||||||||||||||||||
|
Author: {'name': 'Henrik Edin', 'username': 'henrikedin', 'email': 'henrik.edin@mongodb.com'}Message: Use fmt::compile and fmt::format_int when possible. | |||||||||||||||||||||||||
| Comment by Mark Benvenuto [ 12/Mar/20 ] | |||||||||||||||||||||||||
|
Assigning to henrik.edin since he is making the code changes to address some of the issues identified. | |||||||||||||||||||||||||
| Comment by Bruce Lucas (Inactive) [ 11/Mar/20 ] | |||||||||||||||||||||||||
|
Thanks Henrik, that's good to know. | |||||||||||||||||||||||||
| Comment by Henrik Edin [ 11/Mar/20 ] | |||||||||||||||||||||||||
|
bruce.lucas, mark.benvenuto I ran the same benchmark on 4.2 and it shows a similar picture. LogstreamBuilder::~LogstreamBuilder() + OpDebug::report() that's used to log in completeAndLogOperation() is almost as much time as mongo::performUpdates(). That might be the bulk of the logging, but formatting in other log statements using operator<< is not included because it happens outside of the log system in the old logger. | |||||||||||||||||||||||||
| Comment by Henrik Edin [ 10/Mar/20 ] | |||||||||||||||||||||||||
|
mark.benvenuto Sent you a CR for low hanging fruit improvements. V1 escaped in BSONObj::jsonString, but it did not escape all the strings as we do now. And when BSONObj was logged with the regular toString() there is no escaping. Using SSE to further improve escapeForJSON is an interesting idea. But I think that's too large to do as part of this. We should ticket it out separatley.
| |||||||||||||||||||||||||
| Comment by Mark Benvenuto [ 09/Mar/20 ] | |||||||||||||||||||||||||
|
bruce.lucas, I did not compare against the original version. I originally thought of comparing a "verbose" jstest test but there is more variance for any given test to be a good comparison. I think it would be a good idea for compare YSCB on say 4.2.x which is still using the original V1 logging framework. It would be interesting to look at the relative cost comparison. henrik.edin Where does the log V1 code escape? I found some code on Windows for newlines and some UTF-8 stuff for the Windows console. I did not find any code for the escaping when writing to a file. I think the Evergreen per problems are two fold: In general, benchmarks are designed to not log which is why they will not show a problem. My test was artificially trying to push the logging code harder than a benchmark should. Appendix: Untested example code:
| |||||||||||||||||||||||||
| Comment by Henrik Edin [ 07/Mar/20 ] | |||||||||||||||||||||||||
|
Thanks for the excellent analysis Mark! A few comments and clarifications: 1. Using a faster formatting for integers than fmt::format is probably a good idea. Perhaps the answer is itoa, but there's also a fast path for integers that we can use in libfmt called format_int that's supposed to be very fast. See this benchmark: http://www.zverovich.net/2013/09/07/integer-to-string-conversion-in-cplusplus.html This seem to be a low hanging fruit we should certainly pursue. 2. escapeForJSON does not write bytes one at a time, but it analyze them one at a time. At a minimum this is something we need to do to escape ASCII control characters (the old logger also did this) but in addition it also validates the UTF-8 encoding by doing bit masking. Blocks of data that can be written unchanged are written as blocks. The overall time spent in the logger looks worrisome, lots of it is spent in boost::log and there's not much we can do here apart from modify the library or use a different logging backend. It would be interesting to run the same benchmark on 4.2 and see how much time we spent in the old logger. It was also doing expensive formatting using iostreams and might also show up high up in this benchmark. The perf fallout so far from this project is one test being slower but a couple being faster. Do you think the CPU perf is what's causing issues on Evergreen? I would think that it would affect many more benchmarks if that was the case.
| |||||||||||||||||||||||||
| Comment by Bruce Lucas (Inactive) [ 07/Mar/20 ] | |||||||||||||||||||||||||
|
Great work Mark. Do you have an idea how the cost of logging compares with v1? Will customers who are experiencing a performance problem causing a lot of slow query logging, or who intentionally set slowms to 0 or set log level to 1, see it more impacted by logging than before? | |||||||||||||||||||||||||
| Comment by Mark Benvenuto [ 06/Mar/20 ] | |||||||||||||||||||||||||
|
I did two analyses:
I/O We are flushing whole lines. We are not writing a single log line over multiple calls. The write is occurring with ostream::flush().
Logging Code Test case YCSB, 50000 Records, workload a MongoD args: --logpath=/dev/null --slowms=0 --storageEngine=inMemory --dbpath=/tmp/foo Profiler: VTune 2020
Results: If we look at the overall most expensive functions in this artificial workload to magnify the cost of logging, we see that logging is almost as expensive as doing updates and we spend about have the command invocation time in logging.
Breaking it down: open_record is unexpectedly expensive it does a hash table lookup just to determine if we need to log and takes two different read locks.
Next: As we move into the meat of the logging, our test case is mostly slow query logs which need to log BSON. As a result, we see how expensive things are for formatting BSONObj.
Finally: In the rest of the formatting code, we see that libfmt dominates the cost excessively. Most of this cost comes from the library having to parse the format string. Also, outputDateAsISOStringLocal is expensive because of a mix of tz_convert and sprintf.
And no, the dominant cost for libfmt is not coming from it appending the values to the output stream. It is from parsing the format specifiers. Recommendations:
|