This was first published on https://blog.dbi-services.com/sockets-cores-virtual-cpu-logical-cpu-hyper-threading-what-is-a-cpu-nowadays-1 (2014-01-13)
Republishing here for new followers. The content is related to the the versions available at the publication date
Because people know that I really like Oracle AWR reports, they send them to me from time to time. Here is one I have received for Christmas with the following question: ‘Since our AIX server is virtualized, the response times are 4x longer. I feel that we are CPU bound, but cpu usage is mostly idle. Is there something hidden?’ It is a question that is frequently raised with virtualized environments. Here, we will see that the system reports the CPU utilization as being only 30% busy. However, despite appearances, the report is coming from a very busy system suffering from CPU starvation. So, yes, something is hidden and we will see how we can reveal it.
A CPU nowadays is not the physical processor we used to have. The example I have here comes from an IBM LPAR running AIX and I will explain Virtual CPU, Logical CPU and CPU multi-threading. Even if it’s not Oracle specific, I’ll use the figures from the AWR report which show the statistics gathered from the OS.
The AWR report covers 5 hours of activity where users are experiencing long response times:
Snap Id | Snap Time | Sessions | Cursors/Session | |
---|---|---|---|---|
Begin Snap: | 27460 | 20-Dec-13 15:00:07 | 142 | 2.8 |
End Snap: | 27465 | 20-Dec-13 20:00:49 | 101 | 2.2 |
Elapsed: | 300.72 (mins) | |||
DB Time: | 855.93 (mins) |
Here is the average CPU load gathered from the OS:
Host CPU (CPUs: 20 Cores: 5 Sockets: )
Load Average Begin | Load Average End | %User | %System | %WIO | %Idle |
---|---|---|---|---|---|
9.82 | 9.26 | 18.7 | 12.0 | 4.4 | 69.3 |
And because the report covers 5 hours we have an hourly detail in ‘Operating System Statistics – Detail':
Snap Time | Load | %busy | %user | %sys | %idle | %iowait |
---|---|---|---|---|---|---|
20-Dec 15:00:07 | 9.82 | |||||
20-Dec 16:00:15 | 12.50 | 26.67 | 14.09 | 12.58 | 73.33 | 6.21 |
20-Dec 17:00:26 | 15.53 | 41.47 | 24.44 | 17.03 | 58.53 | 5.84 |
20-Dec 18:00:34 | 6.44 | 36.91 | 25.20 | 11.70 | 63.09 | 4.55 |
20-Dec 19:00:40 | 6.52 | 24.39 | 15.00 | 9.40 | 75.61 | 3.35 |
20-Dec 20:00:49 | 9.26 | 24.34 | 14.99 | 9.35 | 75.66 | 1.88 |
The point is that when you look at that you think that the system is more than 50% idle. But that’s wrong.
Look at the second line. Being only 26.67% busy with a load of 12.5 running processes would mean that the system is able to run 12.50/0.2667=47 processes concurrently. But that’s wrong again: we don’t have 47 CPU on the whole system.
Ratios are evil, because they hide the real numbers they come from. Let’s see how the %cpu utilization is calculated.
Statistic | Value | End Value |
---|---|---|
BUSY_TIME | 11,162,236 | |
IDLE_TIME | 25,169,672 | |
IOWAIT_TIME | 1,584,726 | |
SYS_TIME | 4,360,698 | |
USER_TIME | 6,801,538 | |
LOAD | 10 | 9 |
NUM_CPUS | 20 | |
NUM_CPU_CORES | 5 | |
NUM_LCPUS | 20 | |
NUM_VCPUS | 5 |
During the 5 hours where the statistics were collected, we had 11,162,236 centiseconds of CPU that were used by running processes, and 25,169,672+1,584,726 centiseconds of CPU resources were not used. Which supposes that we have in total 11,162,236+25,169,672+1,584,726 centiseconds, and that is about 100 hours of CPU time available. 100 hours of CPU during 5 hours of elapsed time supposes that we have 100/5=20 CPU in the system.
But that’s wrong. We cannot run 20 processes concurrently. And the %cpu that is calculated from that is a lie because we cannot reach 100%.
This is virtualization: it lies about the available CPU power. So, do we have to calculate the %cpu from the number of cores that is 5 here ? Let’s do it. We used 11,162,236 centiseconds of CPU, that is 31 hours. If we have 5 cpu during 5 hours then the cpu utilization is 31/(5×5)=124%
This is hyper-threading: we have more CPU than the number of cores.
Now it’s time to define what is a CPU if you want to understand the meaning of the numbers.
The CPU is the hardware that executes the instructions of a running process. One CPU can cope with one process which is always working on CPU, or with two processes that spend half of their time outside of CPU (waiting for I/O for example), etc.
With multi-core processors, we have a socket that has several CPU cores in it. Cores in a socket can share some resources, but the core is processing the process instructions. So the number of CPU we’re interrested in is the number of cores. The OS will calculate the %cpu from the number of cores, and that’s the right thing to do. And Oracle uses the number of cores to calculate the license that you have to buy, and that’s fair: it’s the number of instances of their software that can run concurrently.
Now comes virtualization. In our exemple, we are on an IBM LPAR with 5 virtual CPU (the NUM_VCPU from the V$OSSTAT). What does that mean ? Can we have 5 processes running on CPU efficiently ? Yes if the other LPARs of the same hardware are not too busy. Then our 5 virtual CPU will actually run on 5 physical cores. But if all the LPAR are demanding CPU resources, and there’s not enough physical CPU then the physical CPU resources will be shared. We will run on 5 virtual CPU, but those VCPU will be slower than physical ones, because there are shared.
Besides that, when the CPU has to access to RAM, there is a latency where the CPU is waiting before executing the next instruction. So the processor manufacturers introduced the ability to run another thread of process during that time (and without any context switch). This is called hyper-threading by Intel, or Symmetric Multiprocessing (SMP) by IBM. In our exemple we are on POWER7 processors that can do 4-way SMT. So theorically each of our 5 VCPU can run 4 threads: here are the 20 logical CPU that are reported. But we will never be able to run 4x more concurrent processes, so calculating %cpu utilization from that is misleading.
Then, how to get the real picture ? My favorite numbers come from load average and runqueue.
Look at the line where load average was 15.53 and cpu utilization shows 41.47%. Because the 41.47% was calculated from the 20 CPU hypothesis, we know that we had on average 0.4147*20=8.294 process threads being running on CPU. And then among the 15.53 processes willing to run, 15.53-8.294=7.236 were waiting in the runqueue. When you wait nearly as much as you work you know that you are lacking resources: this is CPU starvation.
So hyperthreading is a nice feature. We can run 8 threads on 5 CPU. But don’t be fooled by virtualization or hyper-threading. You should always try to know the physical resources that are behind it, and that usually requires the help of your system admin.
Very interesting I would like to know your thoughts about Oracle database running with HyperThreaded CPUs. Do you think that is correct to consider how much more avaiable “horse power”? Thanks in advance
Johnny, It’s very difficult to answer that because hyper-threading efficiency depends a lot on what is running on the threads of the core. Any ‘benchmark’ stating that the performance improves by xx% is only related to one specific workload. Note that xx% is not above 20-30 % in any case. First in order to see hyper-threading in action, you need to have high CPU utilization: the number of running processes reaching the number of core. Then, you may see an improvement because another thread can run while you’re accessing memory. And that happens a lot when reading buffers. But then there is the problem with the L1 CPU cache that is shared (L2 is often shared by cores anyway). When you have only one thread, the processes takes benefit of that cache where the oracle block should fit. But with several threads, when the L1 cache is shared, then one thread will have its data evicted from memory cache by the other thread. So personally I don’t consider hyper-threading as more “horse power” and I try to keep the number of processes running in cpu lower than the number of available cores. Hyper-threading will come into play during a cpu usage peak that goes higher than the number of cores: it will lower a bit the bad effect of cpu starvation by avoiding too frequent context switches. But you can test it on your machine if you have a stress test workload. Set instance caging (activate a resource manager plan and set cpu_count) to the number of cores and see if the performance improves when increasing cpu_count. Regards, Franck.
Great I understood your point, very reasonable. What let me crazy is the fact how numbers about CPU utilizations are reported by tools like TOP or SAR. For example, using a Intel CPU with 2 cores (and HT) without consider queueing, if we have the folowing utilizations per core:
vCPU0 utilization 100% (real core) vCPU1 utilization 30% vCPU3 utilization 100% (real core) vCPU4 utilization 30%
Top could show us an average utilization of about 65%. Could we to say that we are near real total capacity of CPU? Of course, roughly speaking.
I don’t know if I am being clear enough Its more or less about what Adrian wrote here: http://www.hpts.ws/papers/2007/Cockcroft_CMG06-utilization.pdf
Thanks