Linux CPU Performance Monitoring

Linux CPU Performance Monitoring

1.0 Performance Monitoring Overview

Performance optimization involves identifying and eliminating system bottlenecks. Many administrators believe that performance optimization can be achieved by following "cookbook" solutions, often through kernel configuration adjustments. However, these generic solutions don't fit every environment. Performance optimization is about achieving balance among various OS subsystems, including:
  • CPU
  • Memory
  • IO
  • Network
These subsystems are interdependent, with high load in one affecting others. For example:
  1. Excessive page-in requests can cause memory queue congestion
  2. High network throughput can increase CPU overhead
  3. Heavy CPU usage may lead to more memory requests
  4. Excessive disk writes from memory can cause additional CPU and IO issues
Optimizing a system requires identifying the bottleneck, which may not be immediately apparent as the problematic subsystem.

1.1 Determining Application Types

To effectively optimize performance, it's essential to understand the system's characteristics. Applications generally fall into two categories:

IO-Bound Applications

These applications have high memory and storage system utilization, indicating heavy data processing. IO-bound applications typically don't make excessive requests to CPU or network (except with network storage hardware). Their CPU usage is mainly for generating IO requests and entering kernel sleep states. Database systems like MySQL and Oracle are examples of IO-bound applications.

CPU-Bound Applications

These applications have high CPU utilization, involving batch processing of CPU requests and mathematical computations. Web servers, mail servers, and other service-oriented applications are typically CPU-bound.

1.2 Establishing Baseline Statistics

System utilization varies based on administrator experience and system purpose. The key is understanding what performance goals you want to achieve and what aspects need optimization. Establishing a baseline provides a reference point for comparing system performance under normal conditions versus high load. For example, here's a baseline system performance snapshot compared to one under high load:
# vmstat 1
procs memory swap io system cpu
r b swpd free buff cache si so bi bo in cs us sy wa id
1 0 138592 17932 126272 214244 0 0 1 18 109 19 2 1 1 96
0 0 138592 17932 126272 214244 0 0 0 0 105 46 0 1 0 99
0 0 138592 17932 126272 214244 0 0 0 0 198 62 40 14 0 45
0 0 138592 17932 126272 214244 0 0 0 0 117 49 0 0 0 100
0 0 138592 17924 126272 214244 0 0 0 176 220 938 3 4 13 80
0 0 138592 17924 126272 214244 0 0 0 0 358 1522 8 17 0 75
1 0 138592 17924 126272 214244 0 0 0 0 368 1447 4 24 0 72
0 0 138592 17924 126272 214244 0 0 0 0 352 1277 9 12 0 79
# vmstat 1
procs memory swap io system cpu
r b swpd free buff cache si so bi bo in cs us sy wa id
2 0 145940 17752 118600 215592 0 1 1 18 109 19 2 1 1 96
2 0 145940 15856 118604 215652 0 0 0 468 789 108 86 14 0 0
3 0 146208 13884 118600 214640 0 360 0 360 498 71 91 9 0 0
2 0 146388 13764 118600 213788 0 340 0 340 672 41 87 13 0 0
2 0 147092 13788 118600 212452 0 740 0 1324 620 61 92 8 0 0
2 0 147360 13848 118600 211580 0 720 0 720 690 41 96 4 0 0
2 0 147912 13744 118192 210592 0 720 0 720 605 44 95 5 0 0
2 0 148452 13900 118192 209260 0 372 0 372 639 45 81 19 0 0
2 0 149132 13692 117824 208412 0 372 0 372 457 47 90 10 0 0
In the first output, the last column (id) shows idle time, with CPU idle time ranging from 79% to 100%. The second output shows 100% utilization with no idle time. This comparison helps determine if CPU usage needs optimization.

2.0 Installing Monitoring Tools

Most Unix-like systems include standard monitoring commands that have been part of Unix from the beginning. Linux provides additional monitoring tools through base packages and extra repositories available in most Linux distributions. While there are many open-source and third-party monitoring tools, this document focuses on those available in standard Linux distributions. The following table lists common monitoring tools:
Tool Description Base Package Repository
vmstat General purpose performance tool Yes Yes
mpstat Provides statistics per CPU No Yes
sar General purpose performance monitoring tool No Yes
iostat Provides disk statistics No Yes
netstat Provides network statistics Yes Yes
dstat Monitoring statistics aggregator No In most distributions
iptraf Traffic monitoring dashboard No Yes
netperf Network bandwidth tool No In some distributions
ethtool Reports on Ethernet interface configuration Yes Yes
iperf Network bandwidth tool No Yes
tcptrace Packet analysis tool No Yes

3.0 CPU Overview

CPU utilization depends on what resources are attempting to access it. The kernel scheduler manages two types of resources: threads (single or multiple) and interrupts. The scheduler defines different priorities for different resources, ordered from highest to lowest:
  1. Interrupts - Devices notify the kernel when they complete data processing. For example, when a network card delivers packets or hardware provides an IO request.
  2. Kernel(System) Processes - All kernel processes with control priority levels.
  3. User Processes - This involves "userland". All software programs run in user space and have lower priority in the kernel scheduling mechanism.
Understanding how the kernel manages these resources is crucial. Key concepts include context switching, run queues, and utilization.

3.1 Context Switching

Modern processors can run processes (single-threaded) or threads. Multi-threaded processors can run multiple threads simultaneously. However, the Linux kernel treats each processor core in a dual-core chip as separate processors. For example, a Linux system on a dual-core processor reports two independent processors. A standard Linux kernel can handle 50 to 50,000 processing threads. With a single CPU, the kernel schedules and balances each process thread. Each thread is allocated a time slice on the processor. A thread either gets its time slice or is preempted by higher-priority threads (like hardware interrupts). Higher-priority threads are moved back to the processor's queue. This thread transition is called a context switch. Each context switch consumes resources to close the thread in CPU registers and place it in the queue. More context switches mean more work for the kernel under processor scheduling management.

3.2 Run Queues

Each CPU maintains a run queue for threads. Theoretically, the scheduler should continuously run and execute threads. Process threads are either in a sleep state (blocked or waiting for IO) or in a runnable state. If the CPU subsystem is under high load, the kernel scheduler cannot respond to system requests promptly, causing runnable processes to congest the run queue. As the run queue grows, threads spend more time waiting to be executed. The popular term "load" describes the current state of the run queue. System load indicates how many threads are in the CPU queue and how many are currently being executed. For example, a dual-core system executing 2 threads with 4 in the run queue would have a load of 6. The load averages shown in the top command represent the load over 1, 5, and 15 minutes.

3.3 CPU Utilization

CPU utilization defines the percentage of CPU usage. It's the most important metric for evaluating system performance. Most performance monitoring tools categorize CPU utilization as follows:
  • User Time - Percentage of CPU time spent executing processes in user space.
  • System Time - Percentage of CPU time spent on kernel threads and interrupts.
  • Wait IO - Percentage of idle CPU time while all processes are blocked waiting for IO requests to complete.
  • Idle - Percentage of time the CPU is completely idle.

4.0 CPU Performance Monitoring

Understanding the relationship between run queues, utilization, and context switching is crucial for CPU performance optimization. As mentioned earlier, performance is measured against baseline data. Expected performance metrics in some systems include:
  • Run Queues - Each processor should have no more than 1-3 threads in its run queue. For example, a dual-core processor should not exceed 6 threads in the run queue.
  • CPU Utilization - If a CPU is fully utilized, the balanced proportion between utilization categories should be:
65% - 70% User Time
30% - 35% System Time
0% - 5% Idle Time
  • Context Switches - The number of context switches directly relates to CPU usage. If CPU utilization remains in the balanced state above, a high number of context switches is normal.
Many Linux tools can provide these metrics, with vmstat and top being the most common.

4.1 Using the vmstat Tool

The vmstat tool provides a low-overview of system performance. Since vmstat itself has minimal overhead, you can use it to monitor system health even on very high-load servers. The tool operates in two modes: average and sample. The sample mode measures values at specified intervals, which helps understand performance under sustained load. Here's an example of vmstat running at 1-second intervals:
# vmstat 1
procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
0 0 104300 16800 95328 72200 0 0 5 26 7 14 4 1 95 0
0 0 104300 16800 95328 72200 0 0 0 24 1021 64 1 1 98 0
0 0 104300 16800 95328 72200 0 0 0 0 1009 59 1 1 98 0
Field Description
r Number of threads in the run queue. These threads are runnable but the CPU is not available to execute them.
b Number of processes blocked waiting for IO requests to complete.
in Number of interrupts being processed.
cs Number of context switches currently happening on the system.
us Percentage of user CPU utilization.
sys Percentage of kernel and interrupts utilization.
wa Percentage of idle processor time because all runnable threads are blocked waiting for IO.
id Percentage of time that the CPU is completely idle.

4.2 Case Study: Sustained CPU Utilization

In this example, the system is fully utilized:
# vmstat 1
procs memory swap io system cpu
r b swpd free buff cache si so bi bo in cs us sy wa id
3 0 206564 15092 80336 176080 0 0 0 0 718 26 81 19 0 0
2 0 206564 14772 80336 176120 0 0 0 0 758 23 96 4 0 0
1 0 206564 14208 80336 176136 0 0 0 0 820 20 96 4 0 0
1 0 206956 13884 79180 175964 0 412 0 2680 1008 80 93 7 0 0
2 0 207348 14448 78800 175576 0 412 0 412 763 70 84 16 0 0
2 0 207348 15756 78800 175424 0 0 0 0 874 25 89 11 0 0
1 0 207348 16368 78800 175596 0 0 0 0 940 24 86 14 0 0
1 0 207348 16600 78800 175604 0 0 0 0 929 27 95 3 0 2
3 0 207348 16976 78548 175876 0 0 0 2508 969 35 93 7 0 0
4 0 207348 16216 78548 175704 0 0 0 0 874 36 93 6 0 1
4 0 207348 16424 78548 175776 0 0 0 0 850 26 77 23 0 0
2 0 207348 17496 78556 175840 0 0 0 0 736 23 83 17 0 0
0 0 207348 17680 78556 175868 0 0 0 0 861 21 91 8 0 1
Based on these observations:
  1. There are many interrupts (in) and few context switches (cs), indicating a single process generating hardware device requests.
  2. The user time (us) is frequently 85% or higher, suggesting the application is still being processed by the CPU.
  3. The run queue is within acceptable performance limits, though it exceeds the allowed threshold in two instances.

4.3 Case Study: Overloaded Scheduling

In this example, context switching in the kernel scheduler is saturated:
# vmstat 1
procs memory swap io system cpu
r b swpd free buff cache si so bi bo in cs us sy wa id
2 1 207740 98476 81344 180972 0 0 2496 0 900 2883 4 12 57 27
0 1 207740 96448 83304 180984 0 0 1968 328 810 2559 8 9 83 0
0 1 207740 94404 85348 180984 0 0 2044 0 829 2879 9 6 78 7
0 1 207740 92576 87176 180984 0 0 1828 0 689 2088 3 9 78 10
2 0 207740 91300 88452 180984 0 0 1276 0 565 2182 7 6 83 4
3 1 207740 90124 89628 180984 0 0 1176 0 551 2219 2 7 91 0
4 2 207740 89240 90512 180984 0 0 880 520 443 907 22 10 67 0
5 3 207740 88056 91680 180984 0 0 1168 0 628 1248 12 11 77 0
4 2 207740 86852 92880 180984 0 0 1200 0 654 1505 6 7 87 0
6 1 207740 85736 93996 180984 0 0 1116 0 526 1512 5 10 85 0
0 1 207740 84844 94888 180984 0 0 892 0 438 1556 6 4 90 0
Based on these observations:
  1. The number of context switches exceeds interrupts, indicating significant kernel time is spent on context switching threads.
  2. High context switching causes imbalanced CPU utilization categories, with very high wait IO percentage (wa) and very low user time (us).
  3. Since the CPU is blocked on IO requests, the run queue contains numerous runnable threads waiting to execute.

4.4 Using the mpstat Tool

If your system runs on multi-processor chips, you can use the mpstat command to monitor each processor independently. The Linux kernel views dual-core processors as 2 CPUs, so a dual-core processor with dual cores reports 4 CPUs available. The CPU utilization statistics from mpstat are similar to vmstat, but mpstat provides statistics per individual processor:
# mpstat –P ALL 1
Linux 2.4.21-20.ELsmp (localhost.localdomain) 05/23/2006

05:17:31 PM CPU %user %nice %system %idle intr/s
05:17:32 PM all 0.00 0.00 3.19 96.53 13.27
05:17:32 PM 0 0.00 0.00 0.00 100.00 0.00
05:17:32 PM 1 1.12 0.00 12.73 86.15 13.27
05:17:32 PM 2 0.00 0.00 0.00 100.00 0.00
05:17:32 PM 3 0.00 0.00 0.00 100.00 0.00

4.5 Case Study: Underutilized Processing Capacity

In this example, 4 CPU cores are available. Cores 0 and 1 primarily handle process execution. Core 3 handles all kernel and other system functions. Core 2 remains idle. Using the top command, we can see three processes almost fully occupying the CPU cores:
# top -d 1
top - 23:08:53 up 8:34, 3 users, load average: 0.91, 0.37, 0.13
Tasks: 190 total, 4 running, 186 sleeping, 0 stopped, 0 zombie
Cpu(s): 75.2% us, 0.2% sy, 0.0% ni, 24.5% id, 0.0% wa, 0.0% hi, 0.0% si
Mem: 2074736k total, 448684k used, 1626052k free, 73756k buffers
Swap: 4192956k total, 0k used, 4192956k free, 259044k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
15957 nobody 25 0 2776 280 224 R 100 20.5 0:25.48 php
15959 mysql 25 0 2256 280 224 R 100 38.2 0:17.78 mysqld
15960 apache 25 0 2416 280 224 R 100 15.7 0:11.20 httpd
15901 root 16 0 2780 1092 800 R 1 0.1 0:01.59 top
1 root 16 0 1780 660 572 S 0 0.0 0:00.64 init
# mpstat –P ALL 1
Linux 2.4.21-20.ELsmp (localhost.localdomain) 05/23/2006

05:17:31 PM CPU %user %nice %system %idle intr/s
05:17:32 PM all 81.52 0.00 18.48 21.17 130.58
05:17:32 PM 0 83.67 0.00 17.35 0.00 115.31
05:17:32 PM 1 80.61 0.00 19.39 0.00 13.27
05:17:32 PM 2 0.00 0.00 16.33 84.66 2.01
05:17:32 PM 3 79.59 0.00 21.43 0.00 0.00

05:17:32 PM CPU %user %nice %system %idle intr/s
05:17:33 PM all 85.86 0.00 14.14 25.00 116.49
05:17:33 PM 0 88.66 0.00 12.37 0.00 116.49
05:17:33 PM 1 80.41 0.00 19.59 0.00 0.00
05:17:33 PM 2 0.00 0.00 0.00 100.00 0.00
05:17:33 PM 3 83.51 0.00 16.49 0.00 0.00
You can also use the ps command with the PSR column to check which process is using which CPU:
# while :; do ps -eo pid,ni,pri,pcpu,psr,comm | grep 'mysqld'; sleep 1;
done
PID NI PRI %CPU PSR COMMAND
15775 0 15 86.0 3 mysqld
PID NI PRI %CPU PSR COMMAND
15775 0 14 94.0 3 mysqld
PID NI PRI %CPU PSR COMMAND
15775 0 14 96.6 3 mysqld
PID NI PRI %CPU PSR COMMAND
15775 0 14 98.0 3 mysqld
PID NI PRI %CPU PSR COMMAND
15775 0 14 98.8 3 mysqld
PID NI PRI %CPU PSR COMMAND
15775 0 14 99.3 3 mysqld

4.6 Conclusion

CPU performance monitoring involves several key aspects:
  1. Check the system's run queue and ensure it doesn't exceed 3 runnable threads per processor.
  2. Maintain a user/system ratio of 70/30 in CPU utilization.
  3. When the CPU spends more time in system mode, it indicates overload and priority rescheduling may be necessary.
  4. When IO processing increases, CPU-bound applications will be affected.

Tags: Linux CPU monitoring performance optimization System Administration vmstat

Posted on Mon, 18 May 2026 04:33:30 +0000 by evil turnip