-
Type: Bug
-
Resolution: Gone away
-
Priority: Major - P3
-
None
-
Affects Version/s: 4.4.0-rc0
-
Component/s: Index Maintenance
-
None
-
Storage Execution
-
ALL
Update: This regression has been fixed since 4.4.0-rc1 due to SERVER-47407. This ticket will track the investigation of the initial regression.
By "much slower" I mean that it takes about 2X longer to create indexes for Linkbench in 4.4 compared to 4.2. Linkbench is in DSI, although I need to make a few changes to it so that the secondary index is created after the load as done here.
By 4.4 I mean both 4.4.0-rc0 and a build from the v4.4 branch prior to the merge of durable history.
From vmstat and iostat I see that 4.4 uses ~3X more CPU than 4.2. Both 4.2 and 4.4 do a similar amount of IO to storage – both read & write.
By "IO-bound" I mean that the test server has 16G of RAM and the indexed collection is ~80G per db.$collection.stats and mongod was setup as a single-node replicaset.
I will provide ftdc and CPU flamegraphs tomorrow. For now I start with a summary of performance data collected during the create index.
Results are provided for MongoDB versions 4.2.5, 4.4.0-rc0 and 4.4pre, where 4.4pre is a build from the v4.4 branch immediately prior to the merge of durable history.
The first table is mostly data from iostat and vmstat explained in "Index, Per Operation" here. This is performance data divided by the insert rate to estimate the performance overhead per inserted row.
From this table the value of "secs" is the number of seconds for the create index and that value is ~3X larger for 4.4pre and 4.4.0rc0 vs 4.2.5. Also the CPU consumption per indexed row (cpupi) is ~3X larger for the 4.4 binaries vs 4.2.5. But the amount of reads to and writes from storage (rkbpi, wkbpi) are similar between 4.4 and 4.2, so this looks like a new CPU problem.
ips secs rpi rkbpi wkbpi cspi cpupi csecpq dsecpq csec dsec dbgb cnf 311740 2440 0.001 0.133 0.067 0.0 84 0.0 3.2 0 2421 209.8 mo425.c5 107679 7064 0.002 0.132 0.066 0.0 239 0.0 9.3 0 7109 219.8 mo44pre.c5 121296 6271 0.001 0.144 0.066 0.0 218 0.0 8.4 0 6402 264.4 mo440rc0.c5
The next table is mostly data from iostat and vmstat explained in "Index, Per Second" here. Most of these are the average values for the counters from vmstat and iostat. From this I see that the IO rates to and from storage (rmbps, wmbps) are much larger for 4.2.5 and one reason for that is that 4.4 has more CPU overhead in between doing IO.
ips secs rps rmbps wmbps csps cpups cutil dutil vsz rss cnf 311740 2440 448 40 21 1518 26.1 0.000 0.992 13.9 12.0 mo425.c5 107679 7064 181 14 7 1670 25.7 0.000 1.006 15.7 13.6 mo44pre.c5 121296 6271 141 17 8 1911 26.5 0.000 1.021 17.2 14.7 mo440rc0.c5