This was first published on https://blog.dbi-services.com/spectre-and-meltdown-oracle-database-aws-slob (2018-01-09)
Republishing here for new followers. The content is related to the the versions available at the publication date
Last year, I measured the CPU performance for an Oracle Database on several types of AWS instances. Just by curiosity, I’ve run the same test (SLOB cached reads) now that Amazon has applied all Spectre and Meltdown mitigation patches. I must admit that I wanted to test this on the Oracle Cloud first. I’ve updated a IaaS instance to the latest kernel but the Oracle Unbreakable Enterprise Kernel does not include the Meltdown fix yet, and booting on the Red Hat Compatible Kernel quickly goes to a kernel panic not finding the root LVM.
This is not a benchmark you can rely on to estimate the CPU usage overhead on your application. This test is not doing system calls (so the KPTI fix should be at its minimal impact). If your application is bound on system calls (network roundtrips, physical reads) the consequences can be worse. But in that case, you have a design problem which was just masked by hardware, optimized, but insecure, by a processor running the code before testing.
M4 is hyper-threaded so with 2 Oracle processor licenses we can use 4 vCPU. Here I was on Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz, 2 cores with 2 threads each.
Load Profile Per Second Per Transaction Per Exec Per Call ~~~~~~~~~~~~~~~ --------------- --------------- --------- --------- DB Time(s): 1.0 13.1 0.00 5.46 DB CPU(s): 1.0 13.1 0.00 5.46 Logical read (blocks): 874,326.7 11,420,189.2
Load Profile Per Second Per Transaction Per Exec Per Call ~~~~~~~~~~~~~~~ --------------- --------------- --------- --------- DB Time(s): 2.0 27.3 0.00 9.24 DB CPU(s): 2.0 27.2 0.00 9.22 Logical read (blocks): 1,540,116.9 21,047,307.6
Load Profile Per Second Per Transaction Per Exec Per Call ~~~~~~~~~~~~~~~ --------------- --------------- --------- --------- DB Time(s): 4.0 54.6 0.00 14.46 DB CPU(s): 4.0 54.3 0.00 14.39 Logical read (blocks): 1,779,361.3 24,326,538.0
Same CPU now with the latest RedHat kernel.
[ec2-user@ip-172-31-15-31 ~]$ cat /proc/cmdline BOOT_IMAGE=/boot/vmlinuz-3.10.0-693.11.6.el7.x86_64 root=UUID=3e11801e-5277-4d87-be4c-0a9a61fbc3da ro console=ttyS0,115200n8 console=tty0 net.ifnames=0 crashkernel=auto LANG=en_US.UTF-8
Here is the LIOPS result for the same runs.
Load Profile Per Second Per Transaction Per Exec Per Call ~~~~~~~~~~~~~~~ --------------- --------------- --------- --------- DB Time(s): 1.0 13.7 0.00 4.69 DB CPU(s): 1.0 13.7 0.00 4.69 Logical read (blocks): 808,954.0 11,048,988.1
Load Profile Per Second Per Transaction Per Exec Per Call ~~~~~~~~~~~~~~~ --------------- --------------- --------- --------- DB Time(s): 2.0 27.3 0.00 8.00 DB CPU(s): 2.0 27.1 0.00 7.96 Logical read (blocks): 1,343,662.0 18,351,369.1
Load Profile Per Second Per Transaction Per Exec Per Call ~~~~~~~~~~~~~~~ --------------- --------------- --------- --------- DB Time(s): 4.0 42.9 0.00 13.49 DB CPU(s): 4.0 42.5 0.00 13.37 Logical read (blocks): 1,684,204.6 18,106,823.6
The RedHat kernel has options to disable Indirect Branch Restricted Speculation, Indirect Branch Prediction Barriers and Kernel Page Table Isolation
[ec2-user@ip-172-31-15-31 ~]$ cat /proc/cmdline BOOT_IMAGE=/boot/vmlinuz-3.10.0-693.11.6.el7.x86_64 root=UUID=3e11801e-5277-4d87-be4c-0a9a61fbc3da ro console=ttyS0,115200n8 console=tty0 net.ifnames=0 crashkernel=auto LANG=en_US.UTF-8 nopti noibrs noibpb
Here are the same runs after rebooting with nopti noibrs noibpb kernel options:
Load Profile Per Second Per Transaction Per Exec Per Call ~~~~~~~~~~~~~~~ --------------- --------------- --------- --------- DB Time(s): 1.0 30.1 0.00 4.86 DB CPU(s): 1.0 29.8 0.00 4.80 Logical read (blocks): 861,138.5 25,937,061.0
Load Profile Per Second Per Transaction Per Exec Per Call ~~~~~~~~~~~~~~~ --------------- --------------- --------- --------- DB Time(s): 2.0 27.3 0.00 8.00 DB CPU(s): 2.0 27.0 0.00 7.92 Logical read (blocks): 1,493,336.8 20,395,790.6
Load Profile Per Second Per Transaction Per Exec Per Call ~~~~~~~~~~~~~~~ --------------- --------------- --------- --------- DB Time(s): 4.0 42.9 0.00 13.49 DB CPU(s): 4.0 42.4 0.00 13.34 Logical read (blocks): 1,760,218.4 18,911,346.0 Read IO requests: 33.5 360.2
Here only the page table isolation is is disabled.
[ec2-user@ip-172-31-15-31 ~]$ cat /proc/cmdline BOOT_IMAGE=/boot/vmlinuz-3.10.0-693.11.6.el7.x86_64 root=UUID=3e11801e-5277-4d87-be4c-0a9a61fbc3da ro console=ttyS0,115200n8 console=tty0 net.ifnames=0 crashkernel=auto LANG=en_US.UTF-8 nopti
Here are the same runs witn only nopti kernel option:
Load Profile Per Second Per Transaction Per Exec Per Call ~~~~~~~~~~~~~~~ --------------- --------------- --------- --------- DB Time(s): 1.0 30.1 0.00 3.91 DB CPU(s): 1.0 29.8 0.00 3.87 Logical read (blocks): 873,451.2 26,303,984.2
Load Profile Per Second Per Transaction Per Exec Per Call ~~~~~~~~~~~~~~~ --------------- --------------- --------- --------- DB Time(s): 2.0 23.1 0.00 7.60 DB CPU(s): 2.0 22.9 0.00 7.54 Logical read (blocks): 1,502,151.4 17,360,883.8
Load Profile Per Second Per Transaction Per Exec Per Call ~~~~~~~~~~~~~~~ --------------- --------------- --------- --------- DB Time(s): 4.0 42.9 0.00 12.64 DB CPU(s): 4.0 42.4 0.00 12.50 Logical read (blocks): 1,764,293.0 18,954,682.3
The previous tests were using small pages. I did a quick test with KPTI enabled and SGA using large pages:
Load Profile Per Second Per Transaction Per Exec Per Call ~~~~~~~~~~~~~~~ --------------- --------------- --------- --------- DB Time(s): 1.0 30.1 0.00 4.85 DB CPU(s): 1.0 30.1 0.00 4.85 Logical read (blocks): 854,682.1 27,672,906.8
Here is the same but with KPTI disabled:
Load Profile Per Second Per Transaction Per Exec Per Call ~~~~~~~~~~~~~~~ --------------- --------------- --------- --------- DB Time(s): 1.0 30.1 0.00 4.85 DB CPU(s): 1.0 30.1 0.00 4.85 Logical read (blocks): 920,129.9 27,672,906.8
This is just a test on a synthetic workload. Nothing similar to a production database situation. However, those cached SLOB runs are doing what an optimized database application should do most of the time: read blocks from the buffer cache. At least this test is much better than the graphs without explanations, or the SELECT 1, that I have seen these days on social media.
Some interesting food for thought in those numbers, by the way.
Now vs. last year: between 5% and 12% degradation, which is what people have reported those days in general. That looks high but usually when we do database performance troubleshooting we are there to address queries with x10 to x100 CPU usage doing unnecessary stuff because of bad design or suboptimal execution plan.
If disable KPTI: degradation is less than 1%, so that’s an easy way to get the same performance if you are sure that you control all software running. At least temporarily before some database tuning is done.
If disable KPTI, IBRS and IBPB: not better than when disabling only KPTI. I’ve no explanation about that… Makes me wonder if those predictive branching are always a good idea.
In all case, if you are not allocating SGA with large pages, then you should. The KPTI degradation is lowered with large pages, which makes sense as the page table is smaller. And if you are not yet using large pages, the benefit will probably balance the KPTI degradation.
This is not a benchmark and your application may see a higher degradation if doing a lot of system calls. If you upgrade from an old kernel, you may even see an improvement thanks to other features compensating the mitigation ones.
This is *very* valuable testing, Franck. It shows the degradation to the most critical processing that happens in an Oracle Database. A significant impact to the cached accesses would make any impact to I/O a moot issue. Think about it. If your cached work is suffering, your non-cached work is merely academic.
By the way, your first series was 874K, 1540K and 1779K LIOPS. That looks to me like you were plotting out the warmup run. SLOB testing should always show you reproducible results. Thoughts?
Hi Franck
Thanks for these interesting results, especially the focus on large pages. I think even if you’re touching only memory in your tests you still need virtual to physical address translation, so the page table fix is still involved. As you said, the worst case scenario is likely to be more CPU bound than IO bound. We’ve done some testings on a patched OEL7 on our side with a strict CPU bound loop and it shows a 7.5% degradation after the patch is applied, close to what most other vendors noticed so far (MySQL, PostgreSQL). It is far from being representative of a true OLTP or DSS workload, it shows just how bad it can potentially become. Also we did not evaluate the mitigation with pcid on / off, so there may be a room for improvement with pcid on.
David
Thanks David for your feedback. Interesting to see that different tests, even if very different from real-life applications, are all in the same ballpark.