Key System resources in Redhat Linux:
1. CPU
2. Memory
3. Swap
4. Filesystem (Disk or LUN)
5. Network
1.CPU:
CPU utilization can be monitor using various in build Linux tools. top,vmstat and “sar -u” are few of them.You can also check out here about CPU information.
To get the current CPU utilization details,
VMSTAT
[root@Global-RH ~]# vmstat 1 5
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
0 0 0 402432 45308 449980 0 0 92 54 74 112 1 2 94 3 0
0 0 0 402424 45308 450008 0 0 0 0 54 82 1 0 99 0 0
0 0 0 402424 45308 450008 0 0 0 0 33 52 0 0 100 0 0
0 0 0 402432 45316 450004 0 0 0 72 59 109 0 1 99 0 0
0 0 0 402432 45316 450008 0 0 0 0 37 59 0 0 100 0 0
SAR
[root@Global-RH ~]# sar -u 5 5
Linux 2.6.32-279.el6.x86_64 (Global-RH) 11/25/2013 _x86_64_ (1 CPU)
10:34:11 AM CPU %user %nice %system %iowait %steal %idle
10:34:16 AM all 0.00 0.00 0.80 0.00 0.00 99.20
10:34:21 AM all 0.40 0.00 2.00 0.00 0.00 97.60
10:34:26 AM all 0.00 0.00 1.41 0.00 0.00 98.59
10:34:31 AM all 0.00 0.00 0.40 0.00 0.00 99.60
10:34:36 AM all 0.00 0.00 0.80 1.00 0.00 98.20
Average: all 0.08 0.00 1.08 0.20 0.00 98.63
[root@Global-RH ~]#
TOP
top - 10:36:09 up 1:16, 4 users, load average: 0.00, 0.01, 0.04
Tasks: 156 total, 1 running, 155 sleeping, 0 stopped, 0 zombie
Cpu(s): 1.3%us, 1.5%sy, 0.0%ni, 93.9%id, 3.3%wa, 0.0%hi, 0.1%si, 0.0%st
Mem: 1250268k total, 841816k used, 408452k free, 45384k buffers
Swap: 2523128k total, 0k used, 2523128k free, 450152k cached
High CPU consuming process:
[root@Global-RH ~]# top
top - 10:42:11 up 1:22, 4 users, load average: 0.76, 0.22, 0.08
Tasks: 156 total, 2 running, 154 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.0%us,100.0%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 1250268k total, 889884k used, 360384k free, 80212k buffers
Swap: 2523128k total, 0k used, 2523128k free, 450164k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
6010 root 20 0 109m 1244 888 R 98.1 0.1 0:32.21 find
1094 root 20 0 0 0 0 S 1.3 0.0 0:02.40 flush-253:0
6031 root 20 0 15028 1296 964 R 0.7 0.1 0:00.03 top
2680 root 20 0 40336 616 364 S 0.3 0.0 0:01.20 udisks-daemon
1 root 20 0 19348 1564 1252 S 0.0 0.1 0:02.07 init
2 root 20 0 0 0 0 S 0.0 0.0 0:00.01 kthreadd
Using PS command,
[root@Global-RH ~]# ps -eo pcpu,args | sort -k 1 -r | head -8
%CPU COMMAND
90.5 find / -name temp_myname*
0.2 /usr/sbin/vmtoolsd
0.2 /usr/lib/vmware-tools/sbin64/vmtoolsd -n vmusr --blockFd 3
0.1 /usr/sbin/restorecond -u
0.0 [watchdog/0]
0.0 [vmmemctl]
0.0 /usr/sbin/wpa_supplicant -c /etc/wpa_supplicant/wpa_supplicant.conf -B -u -f /var/log/wpa_supplicant.log -P /var/run/wpa_supplicant.pid
[root@Global-RH ~]#
2.Memory:
Memory bottle can be easily identified using vmstat and sar command. You need to be careful to determine the free memory.Because redhat Linux will use free physical memory as cache. The cache memory will be released when its required for applications.
To get the memory information,
Using Meminfo,
[root@Global-RH ~]# cat /proc/meminfo
MemTotal: 1250268 kB
MemFree: 408724 kB
Buffers: 45416 kB
Cached: 450156 kB
SwapCached: 0 kB
Active: 319124 kB
Inactive: 333028 kB
Active(anon): 156764 kB
Inactive(anon): 3332 kB
Active(file): 162360 kB
Inactive(file): 329696 kB
Unevictable: 0 kB
Mlocked: 0 kB
SwapTotal: 2523128 kB
SwapFree: 2523128 kB
Dirty: 28 kB
Writeback: 0 kB
AnonPages: 156596 kB
Mapped: 71844 kB
Shmem: 3520 kB
Slab: 124828 kB
SReclaimable: 63844 kB
SUnreclaim: 60984 kB
KernelStack: 2040 kB
PageTables: 27704 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 3148260 kB
Committed_AS: 712176 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 280420 kB
VmallocChunk: 34359440948 kB
HardwareCorrupted: 0 kB
AnonHugePages: 28672 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
DirectMap4k: 8192 kB
DirectMap2M: 1282048 kB
[root@Global-RH ~]#
Using stat and sar commands,
[root@Global-RH ~]# vmstat 1 5
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
4 0 0 326304 80864 450164 0 0 85 52 119 126 1 6 89 3 0
5 0 0 326272 80864 450164 0 0 0 0 1021 587 0 100 0 0 0
5 0 0 326140 80864 450164 0 0 0 0 1017 700 0 100 0 0 0
6 0 0 326272 80864 450164 0 0 0 0 1018 671 0 100 0 0 0
5 0 0 326272 80864 450164 0 0 0 0 1019 658 0 100 0 0 0
Using SAR command,
[root@Global-RH ~]# sar -r 5 5
Linux 2.6.32-279.el6.x86_64 (Global-RH) 11/25/2013 _x86_64_ (1 CPU)
10:48:22 AM kbmemfree kbmemused %memused kbbuffers kbcached kbcommit %commit
10:48:27 AM 326404 923864 73.89 80872 450164 728784 19.31
10:48:32 AM 326388 923880 73.89 80872 450164 728784 19.31
10:48:37 AM 326404 923864 73.89 80872 450164 728784 19.31
10:48:42 AM 327140 923128 73.83 80872 450164 727352 19.28
10:48:47 AM 327388 922880 73.81 80872 450164 727352 19.28
Average: 326745 923523 73.87 80872 450164 728211 19.30
Using top command,
[root@Global-RH ~]# top
top - 10:49:04 up 1:29, 4 users, load average: 5.69, 2.87, 1.18
Tasks: 162 total, 7 running, 155 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.0%us,100.0%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 1250268k total, 922616k used, 327652k free, 80880k buffers
Swap: 2523128k total, 0k used, 2523128k free, 450164k cached
Using Free command,
[root@Global-RH ~]# free -m
total used free shared buffers cached
Mem: 1220 897 323 0 79 439
-/+ buffers/cache: 379 841
Swap: 2463 0 2463
[root@Global-RH ~]#
As per the above commands outputs , system has 1220MB physical memory and used memory is 897MB. Free memory is 323MB. In that used memory (897MB), 439MB is using as cache by the system.This memory will be released upon the application demands.
If you are running out of free memory and cached memory is very less ,then system is in real memory bottleneck.
High memory consuming process,
Using top,
[root@Global-RH ~]# top -M
top - 10:56:52 up 1:37, 4 users, load average: 0.01, 0.97, 1.01
Tasks: 155 total, 1 running, 154 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 1220.965M total, 897.832M used, 323.133M free, 79.102M buffers
Swap: 2463.992M total, 0.000k used, 2463.992M free, 439.621M cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
7 root 20 0 0 0 0 S 0.3 0.0 0:04.16 events/0
2706 root 20 0 426m 26m 18m S 0.3 2.1 0:15.30 vmtoolsd
6426 root 20 0 15028 1284 964 R 0.3 0.1 0:00.08 top
1 root 20 0 19348 1564 1252 S 0.0 0.1 0:02.08 init
2 root 20 0 0 0 0 S 0.0 0.0 0:00.02 kthreadd
3 root RT 0 0 0 0 S 0.0 0.0 0:00.00 migration/0
using ps command,
[root@Global-RH ~]# ps -e -orss=,args= | sort -b -k1,1n | pr -TW$COLUMNS |tail -10
13088 gnome-volume-control-applet
13268 /usr/bin/gnome-terminal -x /bin/sh -c cd '/root/Desktop' && exec $SHELL
13656 /usr/libexec/clock-applet --oaf-activate-iid=OAFIID:GNOME_ClockApplet_Factory --oaf-ior-fd=28
14444 nm-applet --sm-disable
17796 /usr/bin/gnote --panel-applet --oaf-activate-iid=OAFIID:GnoteApplet_Factory --oaf-ior-fd=22
19260 python /usr/share/system-config-printer/applet.py
20016 /usr/sbin/restorecond -u
20628 nautilus
25744 /usr/bin/Xorg :0 -nr -verbose -audit 4 -auth /var/run/gdm/auth-for-gdm-0wmzZs/database -nolisten tcp vt1
26824 /usr/lib/vmware-tools/sbin64/vmtoolsd -n vmusr --blockFd 3
[root@Global-RH ~]#
Using ps command with command details,
[root@Global-RH ~]# /bin/ps ax -orss,%mem,cmd --sort=rss|tac|head -10
26824 2.1 /usr/lib/vmware-tools/sbin64/vmtoolsd -n vmusr --blockFd 3
25744 2.0 /usr/bin/Xorg :0 -nr -verbose -audit 4 -auth /var/run/gdm/auth-for-gdm-0wmzZs/database -nolisten tcp vt1
20628 1.6 nautilus
20016 1.6 /usr/sbin/restorecond -u
19260 1.5 python /usr/share/system-config-printer/applet.py
17796 1.4 /usr/bin/gnote --panel-applet --oaf-activate-iid=OAFIID:GnoteApplet_Factory --oaf-ior-fd=22
14444 1.1 nm-applet --sm-disable
13656 1.0 /usr/libexec/clock-applet --oaf-activate-iid=OAFIID:GNOME_ClockApplet_Factory --oaf-ior-fd=28
13268 1.0 /usr/bin/gnome-terminal -x /bin/sh -c cd '/root/Desktop' && exec $SHELL
13088 1.0 gnome-volume-control-applet
[root@Global-RH ~]#
3.Swap:
When the physical memory is completely used,then system will be start using the swap space.If the system is running out of swap space, you can see fork errors in the /var/log/messages file. If the system is having 2GB physical memory and configuring with 8GB swap is completely waste. If system start swapping more process to the disk, system performance will be degrade.
I can find very few commands to list the swap information.
1.swapfs
[root@Global-RH ~]# cat /proc/swaps
Filename Type Size Used Priority
/dev/dm-1 partition 2523128 0 -1
[root@Global-RH ~]#
2.Free command.
[root@Global-RH ~]# free -t
total used free shared buffers cached
Mem: 1250268 920380 329888 0 81248 450584
-/+ buffers/cache: 388548 861720
Swap: 2523128 0 2523128
Total: 3773396 920380 2853016
[root@Global-RH ~]#
3.Top
[root@Global-RH ~]# top
top - 11:12:37 up 1:53, 4 users, load average: 0.00, 0.03, 0.34
Tasks: 156 total, 1 running, 155 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.0%us, 0.3%sy, 0.0%ni, 99.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 1250268k total, 920388k used, 329880k free, 81272k buffers
Swap: 2523128k total, 0k used, 2523128k free, 450584k cached
How to identify the memory/swap bottle neck ?
unfortunately Linux doesn’t offer to get the scan rate(sr) to identify the high swap rate like Unix.You need to look at swap in (si) and swap out (so) rate in vmstat command output to determine that.
4. Filesystem (Disk/LUN I/O Bottle Neck)
Sometimes you can see that system may have enough free CPU and Memory resources but still see some performance issues.In these cases ,you need to look at the “iowait” field in “iostat -x” . “mpstat” also you can see the iowait. In Top command, you need to look at “wa%” field. If anything more than 10 ,then CPU’s are waiting for disk to complete the write. Most of the time SAN’s poor performance will make the iowait value to higher.
Using Mpstat,
[root@Global-RH ~]# mpstat 1 5
Linux 2.6.32-279.el6.x86_64 (Global-RH) 11/25/2013 _x86_64_ (1 CPU)
11:38:50 AM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %idle
11:38:51 AM all 0.00 0.00 100.00 46.67 0.00 0.00 0.00 0.00 0.00
11:38:52 AM all 0.00 0.00 100.00 89.67 0.00 0.00 0.00 0.00 0.00
11:38:53 AM all 0.00 0.00 100.00 94.67 0.00 0.00 0.00 0.00 0.00
11:38:54 AM all 0.00 0.00 83.33 46.67 0.00 0.00 0.00 0.00 0.00
11:38:55 AM all 4.35 0.00 95.65 63.67 0.00 0.00 0.00 0.00 0.00
Average: all 1.72 0.00 94.83 99.67 0.00 0.00 0.00 0.00 0.00
[root@Global-RH ~]#
Using iostat,
[root@Global-RH ~]# iostat -x
Linux 2.6.32-279.el6.x86_64 (Global-RH) 11/25/2013 _x86_64_ (1 CPU)
avg-cpu: %user %nice %system %iowait %steal %idle
0.99 0.01 6.29 2.26 0.00 90.45
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util
scd0 0.23 0.00 0.06 0.00 1.14 0.00 19.95 0.00 4.85 3.64 0.02
sda 0.78 8.01 3.09 2.65 116.94 85.50 35.33 0.52 90.15 5.61 93.21
dm-0 0.00 0.00 3.70 10.69 115.76 85.50 13.99 2.29 159.42 2.23 97.21
dm-1 0.00 0.00 0.04 0.00 0.32 0.00 8.00 0.00 4.09 2.38 0.01
[root@Global-RH ~]#
If the svctm is less than 10 is acceptable value for SAN environment. If the svctm value is <10 and still the utilization is more than 60% ,then you need to tune the application or need to spread the database to multiple filesystem to increase the write rate.
How to identify which process is making high i/o wait on the system ?
Use the below syntax to find the process which is making high i/o wait to the system
[root@Global-RH ~]# for x in `seq 1 1 10`; do ps -eo state,pid,cmd | grep "^D"; echo "----"; sleep 5; done
D 7161 cp -i -r /usr /usr_old
----
D 391 [jbd2/dm-0-8]
----
^C
---
[root@Global-RH ~]#
Now we need to get the details of the pid 7161 and 391.
[root@Global-RH ~]# cat /proc/7161/io
rchar: 145039848
wchar: 145032790
syscr: 14556
syscw: 8777
read_bytes: 87707648
write_bytes: 158609408
cancelled_write_bytes: 0
[root@Global-RH ~]# lsof -p 7161
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
cp 7161 root cwd DIR 253,0 4096 130821 /root
cp 7161 root rtd DIR 253,0 4096 2 /
cp 7161 root txt REG 253,0 122736 900 /bin/cp
cp 7161 root mem REG 253,0 156872 178706 /lib64/ld-2.12.so
cp 7161 root mem REG 253,0 22536 178723 /lib64/libdl-2.12.so
cp 7161 root mem REG 253,0 1918016 178707 /lib64/libc-2.12.so
cp 7161 root mem REG 253,0 145720 178710 /lib64/libpthread-2.12.so
cp 7161 root mem REG 253,0 47064 178711 /lib64/librt-2.12.so
cp 7161 root mem REG 253,0 124624 178741 /lib64/libselinux.so.1
cp 7161 root mem REG 253,0 33816 178838 /lib64/libacl.so.1.1.0
cp 7161 root mem REG 253,0 21152 150217 /lib64/libattr.so.1.1.0
cp 7161 root mem REG 253,0 99158576 134657 /usr/lib/locale/locale-archive
cp 7161 root 0u CHR 136,2 0t0 5 /dev/pts/2
cp 7161 root 1u CHR 136,2 0t0 5 /dev/pts/2
cp 7161 root 2u CHR 136,2 0t0 5 /dev/pts/2
cp 7161 root 3r REG 253,0 60073002 169255 /usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64/jre/lib/rt.jar
cp 7161 root 4w REG 253,0 29884416 407742 /usr_old/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64/jre/lib/rt.jar
[root@Global-RH ~]#
In this way you can pinpoint the iowait.
5.Network Bottle Neck.
Network overloading very rare case to impact the system performance.
Look at the interface for any errors(RX-ERR,TX-ERR) using netstat command.
[root@Global-RH ~]# netstat -i
Kernel Interface table
Iface MTU Met RX-OK RX-ERR RX-DRP RX-OVR TX-OK TX-ERR TX-DRP TX-OVR Flg
eth0 1500 0 10697 0 0 0 6930 0 0 0 BMRU
lo 16436 0 20 0 0 0 20 0 0 0 LRU
virbr0 1500 0 0 0 0 0 26 0 0 0 BMRU
[root@Global-RH ~]#
dstat is very important tool to monitor the all the system resources.If its not already installed,install it using “yum install dtstat”
[root@Global-RH ~]# dstat
----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system--
usr sys idl wai hiq siq| read writ| recv send| in out | int csw
1 6 89 4 0 0| 126k 107k| 0 0 | 0 0 | 114 123
0 0 100 0 0 0| 0 0 | 60B 826B| 0 0 | 62 113
0 0 100 0 0 0| 0 0 | 166B 506B| 0 0 | 60 101
0 1 99 0 0 0| 0 0 | 60B 346B| 0 0 | 57 97
0 0 100 0 0 0| 0 12k| 60B 346B| 0 0 | 53 98
0 0 100 0 0 0| 0 0 | 60B 346B| 0 0 | 51 95
0 0 100 0 0 0| 0 0 | 152B 346B| 0 0 | 50 99
0 1 99 0 0 0| 0 0 | 152B 346B| 0 0 | 61 100 ^C
[root@Global-RH ~]#
You can see the nework traffic details in “net/total” fields.
Here is the complete list of option for dstat tool.
[root@Global-RH ~]# dstat -h
Usage: dstat [-afv] [options..] [delay [count]]
Versatile tool for generating system resource statistics
Dstat options:
-c, --cpu enable cpu stats
-C 0,3,total include cpu0, cpu3 and total
-d, --disk enable disk stats
-D total,hda include hda and total
-g, --page enable page stats
-i, --int enable interrupt stats
-I 5,eth2 include int5 and interrupt used by eth2
-l, --load enable load stats
-m, --mem enable memory stats
-n, --net enable network stats
-N eth1,total include eth1 and total
-p, --proc enable process stats
-r, --io enable io stats (I/O requests completed)
-s, --swap enable swap stats
-S swap1,total include swap1 and total
-t, --time enable time/date output
-T, --epoch enable time counter (seconds since epoch)
-y, --sys enable system stats
--aio enable aio stats
--fs, --filesystem enable fs stats
--ipc enable ipc stats
--lock enable lock stats
--raw enable raw stats
--socket enable socket stats
--tcp enable tcp stats
--udp enable udp stats
--unix enable unix stats
--vm enable vm stats
--plugin-name enable plugins by plugin name (see manual)
--list list all available plugins
-a, --all equals -cdngy (default)
-f, --full automatically expand -C, -D, -I, -N and -S lists
-v, --vmstat equals -pmgdsc -D total
--bw, --blackonwhite change colors for white background terminal
--float force float values on screen
--integer force integer values on screen
--nocolor disable colors (implies --noupdate)
--noheaders disable repetitive headers
--noupdate disable intermediate updates
--output file write CSV output to file
delay is the delay in seconds between each update (default: 1)
count is the number of updates to display before exiting (default: unlimited)
[root@Global-RH ~]#
Hope you got some visibility to troubleshooting the performance issues on Redhat Linux .
Thank you for visiting UnixArena.
Mark says
This is the best Linux performance troubleshooting guide I have ever seen. Thank you.
Another way to find which SAN disks are being heavily hit:
sar -d 2 20 | grep -v DEV | awk ‘$11 > 10 {print $3, $11}’ (using your threshold of 10 ms.)
The easiest way I know to map this back to actual files is the lsblk command (RHEL 6 and above.)