Spectre and Meltdown, Oracle Database, AWS, SLOB

Last year, I measured the CPU performance for an Oracle Database on several types of AWS instances. Just by curiosity, I’ve run the same test (SLOB cached reads) now that Amazon has applied all Spectre and Meltdown mitigation patches. I must admit that I wanted to test this on the Oracle Cloud first. I’ve updated a IaaS instance to the latest kernel but the Oracle Unbreakable Enterprise Kernel does not include the Meltdown fix yet, and booting on the Red Hat Compatible Kernel quickly goes to a kernel panic not finding the root LVM.

This is not a benchmark you can rely on to estimate the CPU usage overhead on your application. This test is not doing system calls (so the KPTI fix should be at its minimal impact). If your application is bound on system calls (network roundtrips, physical reads) the consequences can be worse. But in that case, you have a design problem which was just masked by hardware, optimized, but insecure, by a processor running the code before testing.

Figures from last year: M4.xlarge: 4vCPU, 16GB RAM

M4 is hyper-threaded so with 2 Oracle processor licenses we can use 4 vCPU. Here I was on Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz, 2 cores with 2 threads each.

Load Profile                    Per Second   Per Transaction  Per Exec  Per Call
~~~~~~~~~~~~~~~            ---------------   --------------- --------- ---------
             DB Time(s):               1.0              13.1      0.00      5.46
              DB CPU(s):               1.0              13.1      0.00      5.46
  Logical read (blocks):         874,326.7      11,420,189.2

Load Profile                    Per Second   Per Transaction  Per Exec  Per Call
~~~~~~~~~~~~~~~            ---------------   --------------- --------- ---------
             DB Time(s):               2.0              27.3      0.00      9.24
              DB CPU(s):               2.0              27.2      0.00      9.22
  Logical read (blocks):       1,540,116.9      21,047,307.6

Load Profile                    Per Second   Per Transaction  Per Exec  Per Call
~~~~~~~~~~~~~~~            ---------------   --------------- --------- ---------
             DB Time(s):               4.0              54.6      0.00     14.46
              DB CPU(s):               4.0              54.3      0.00     14.39
  Logical read (blocks):       1,779,361.3      24,326,538.0

Jan. 2018 with Spectre and Meltdown mitigation:

Same CPU now with the latest RedHat kernel.

[ec2-user@ip-172-31-15-31 ~]$ cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinuz-3.10.0-693.11.6.el7.x86_64 root=UUID=3e11801e-5277-4d87-be4c-0a9a61fbc3da ro console=ttyS0,115200n8 console=tty0 net.ifnames=0 crashkernel=auto LANG=en_US.UTF-8

Here is the LIOPS result for the same runs.

Load Profile                    Per Second   Per Transaction  Per Exec  Per Call
~~~~~~~~~~~~~~~            ---------------   --------------- --------- ---------
             DB Time(s):               1.0              13.7      0.00      4.69
              DB CPU(s):               1.0              13.7      0.00      4.69
  Logical read (blocks):         808,954.0      11,048,988.1

Load Profile                    Per Second   Per Transaction  Per Exec  Per Call
~~~~~~~~~~~~~~~            ---------------   --------------- --------- ---------
            DB Time(s):                2.0              27.3      0.00      8.00
              DB CPU(s):               2.0              27.1      0.00      7.96
  Logical read (blocks):       1,343,662.0      18,351,369.1

Load Profile                    Per Second   Per Transaction  Per Exec  Per Call
~~~~~~~~~~~~~~~            ---------------   --------------- --------- ---------
             DB Time(s):               4.0              42.9      0.00     13.49
              DB CPU(s):               4.0              42.5      0.00     13.37
  Logical read (blocks):       1,684,204.6      18,106,823.6

Jan. 2018, with Spectre and Meltdown patches, but disabled IBRS, IBPB, KPTI

The RedHat kernel has options to disable Indirect Branch Restricted Speculation, Indirect Branch Prediction Barriers and Kernel Page Table Isolation

[ec2-user@ip-172-31-15-31 ~]$ cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinuz-3.10.0-693.11.6.el7.x86_64 root=UUID=3e11801e-5277-4d87-be4c-0a9a61fbc3da ro console=ttyS0,115200n8 console=tty0 net.ifnames=0 crashkernel=auto LANG=en_US.UTF-8 nopti noibrs noibpb

Here are the same runs after rebooting with nopti noibrs noibpb kernel options:

Load Profile                    Per Second   Per Transaction  Per Exec  Per Call
~~~~~~~~~~~~~~~            ---------------   --------------- --------- ---------
             DB Time(s):               1.0              30.1      0.00      4.86
              DB CPU(s):               1.0              29.8      0.00      4.80
  Logical read (blocks):         861,138.5      25,937,061.0

Load Profile                    Per Second   Per Transaction  Per Exec  Per Call
~~~~~~~~~~~~~~~            ---------------   --------------- --------- ---------
             DB Time(s):               2.0              27.3      0.00      8.00
              DB CPU(s):               2.0              27.0      0.00      7.92
  Logical read (blocks):       1,493,336.8      20,395,790.6

Load Profile                    Per Second   Per Transaction  Per Exec  Per Call
~~~~~~~~~~~~~~~            ---------------   --------------- --------- ---------
             DB Time(s):               4.0              42.9      0.00     13.49
              DB CPU(s):               4.0              42.4      0.00     13.34
  Logical read (blocks):       1,760,218.4      18,911,346.0
       Read IO requests:              33.5             360.2

Then with only KPTI disabled, but all Spectre mitigation enabled

Here only the page table isolation is is disabled.

[ec2-user@ip-172-31-15-31 ~]$ cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinuz-3.10.0-693.11.6.el7.x86_64 root=UUID=3e11801e-5277-4d87-be4c-0a9a61fbc3da ro console=ttyS0,115200n8 console=tty0 net.ifnames=0 crashkernel=auto LANG=en_US.UTF-8 nopti

Here are the same runs witn only nopti kernel option:

Load Profile                    Per Second   Per Transaction  Per Exec  Per Call
~~~~~~~~~~~~~~~            ---------------   --------------- --------- ---------
             DB Time(s):               1.0              30.1      0.00      3.91
              DB CPU(s):               1.0              29.8      0.00      3.87
  Logical read (blocks):         873,451.2      26,303,984.2

Load Profile                    Per Second   Per Transaction  Per Exec  Per Call
~~~~~~~~~~~~~~~            ---------------   --------------- --------- ---------
             DB Time(s):               2.0              23.1      0.00      7.60
              DB CPU(s):               2.0              22.9      0.00      7.54
  Logical read (blocks):       1,502,151.4      17,360,883.8

Load Profile                    Per Second   Per Transaction  Per Exec  Per Call
~~~~~~~~~~~~~~~            ---------------   --------------- --------- ---------
             DB Time(s):               4.0              42.9      0.00     12.64
              DB CPU(s):               4.0              42.4      0.00     12.50
  Logical read (blocks):       1,764,293.0      18,954,682.3

Large pages

The previous tests were using small pages. I did a quick test with KPTI enabled and SGA using large pages:

Load Profile                    Per Second   Per Transaction  Per Exec  Per Call
~~~~~~~~~~~~~~~            ---------------   --------------- --------- ---------
             DB Time(s):               1.0              30.1      0.00      4.85
              DB CPU(s):               1.0              30.1      0.00      4.85
  Logical read (blocks):         854,682.1      27,672,906.8

Here is the same but with KPTI disabled:

Load Profile                    Per Second   Per Transaction  Per Exec  Per Call
~~~~~~~~~~~~~~~            ---------------   --------------- --------- ---------
             DB Time(s):               1.0              30.1      0.00      4.85
              DB CPU(s):               1.0              30.1      0.00      4.85
  Logical read (blocks):         920,129.9      27,672,906.8

So what?

This is just a test on a synthetic workload. Nothing similar to a production database situation. However, those cached SLOB runs are doing what an optimized database application should do most of the time: read blocks from the buffer cache. At least this test is much better than the graphs without explanations, or the SELECT 1, that I have seen these days on social media.

Some interesting food for thought in those numbers, by the way.

Now vs. last year: between 5% and 12% degradation, which is what people have reported those days in general. That looks high but usually when we do database performance troubleshooting we are there to address queries with x10 to x100 CPU usage doing unnecessary stuff because of bad design or suboptimal execution plan.

If disable KPTI: degradation is less than 1%, so that’s an easy way to get the same performance if you are sure that you control all software running. At least temporarily before some database tuning is done.

If disable KPTI, IBRS and IBPB: not better than when disabling only KPTI. I’ve no explanation about that… Makes me wonder if those predictive branching are always a good idea.

In all case, if you are not allocating SGA with large pages, then you should. The KPTI degradation is lowered with large pages, which makes sense as the page table is smaller. And if you are not yet using large pages, the benefit will probably balance the KPTI degradation.

This is not a benchmark and your application may see a higher degradation if doing a lot of system calls. If you upgrade from an old kernel, you may even see an improvement thanks to other features compensating the mitigation ones.

3 Comments

Kevin says:

January 10, 2018 at 20 h 07 min

This is *very* valuable testing, Franck. It shows the degradation to the most critical processing that happens in an Oracle Database. A significant impact to the cached accesses would make any impact to I/O a moot issue. Think about it. If your cached work is suffering, your non-cached work is merely academic.

By the way, your first series was 874K, 1540K and 1779K LIOPS. That looks to me like you were plotting out the warmup run. SLOB testing should always show you reproducible results. Thoughts?

Reply to Kevin
David Baffaleuf says:

January 12, 2018 at 11 h 12 min

Hi Franck

Thanks for these interesting results, especially the focus on large pages. I think even if you’re touching only memory in your tests you still need virtual to physical address translation, so the page table fix is still involved. As you said, the worst case scenario is likely to be more CPU bound than IO bound. We’ve done some testings on a patched OEL7 on our side with a strict CPU bound loop and it shows a 7.5% degradation after the patch is applied, close to what most other vendors noticed so far (MySQL, PostgreSQL). It is far from being representative of a true OLTP or DSS workload, it shows just how bad it can potentially become. Also we did not evaluate the mitigation with pcid on / off, so there may be a room for improvement with pcid on.

David

Reply to David
- Franck Pachot says:
  
  January 12, 2018 at 11 h 39 min
  
  Thanks David for your feedback. Interesting to see that different tests, even if very different from real-life applications, are all in the same ballpark.
  
  Reply to Franck

Follow: Linkedin, Twitter, Youtube, Mastodon, dev.to