概述:输出结构和输出控制
Top命令输出主要分4部分信息:
1、系统负载(load average);
2、CPU使用情况(CPU utilization);
3、内存使用情况(Mem utilization),包括交换区(SWAP);
4、进程列表:默认列出进程号(PID),进程所属的用户(USER),占CPU情况(%CPU),占内存情况(%MEM),从运行到现在总共占用多长时间的CPU(TIME+),进程对应的程序(COMMAND)。

top - 22:09:08 up 14 min,  1 user,  load average: 0.21, 0.23, 0.30
Tasks:  81 total,   1 running,  80 sleeping,   0 stopped,   0 zombie
Cpu(s):  9.5%us, 31.2%sy,  0.0%ni, 27.0%id,  7.6%wa,  1.0%hi, 23.7%si,  0.0%st
Mem:    255592k total,   167568k used,    88024k free,    25068k buffers
Swap:   524280k total,        0k used,   524280k free,    85724k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 3166 apache    15   0 29444 6112 1524 S  6.6  2.4   0:00.79 httpd
 3161 apache    15   0 29444 6112 1524 S  5.9  2.4   0:00.79 httpd
3407 root      16   0  2188 1012  816 R  1.7  0.4   0:00.51 top
  240 root      15   0     0    0    0 S  0.3  0.0   0:00.08 pdflush


Top命令是“交互式”的,也就是当运行top后(在运行状态下),可以输入其他控制命令,可以控制top的输出内容和输出格式(这包括:是否输出系统负载;CPU使用情况;内存使用情况和进程列表的显示列,顺序,排序)。如果什么都不输入,默认情况下top命令每5秒钟自动刷新一次(如果要每2秒刷新一次,执行top时可以携带可选参数top –d 2)。在运行状态下,输入q,则退出。
(1)    如果输入小写的 l - load average ,则“load average”那行将显示/不显示;
(2)    如果输入小写的t - task/cpu stats,则“Task和Cpu”两行则显示/不显示。
(3)    如果输入小写的m – memory,则“Mem和Swap”两行则显示/不显示;
(4)    如果输入?或者小写的h,则显示帮助信息(交互状态下的命令帮助)。


输入?或h的帮助提示:
Help for Interactive Commands - procps version 3.2.7
Window 1:Def: Cumulative mode Off.  System: Delay 3.0 secs; Secure mode Off.

  Z,B       Global: 'Z' change color mappings; 'B' disable/enable bold
  l,t,m     Toggle Summaries: 'l' load avg; 't' task/cpu stats; 'm' mem info
(备注:就是刚才说的,在交互状态下,输入l,t,或m可以让summary信息显示/不显示。)

  1,I       Toggle SMP view: '1' single/separate states; 'I' Irix/Solaris mode

  f,o     . Fields/Columns: 'f' add or remove; 'o' change display order
  F or O  . Select sort field
(备注: 控制进程列表的每个进程显示的列信息,即:哪些列显示/不显示; 还可以控制列的顺序,即:哪些列显示在左面,哪些列显示在右边; 还可以控制记录的行排序,默认是CPU,内存等的降序。如果要升序,则大写的R。)

  <,>     . Move sort field: '<' next col left; '>' next col right
  R,H     . Toggle: 'R' normal/reverse sort; 'H' show threads
  c,i,S   . Toggle: 'c' cmd name/line; 'i' idle tasks; 'S' cumulative time
(备注:c表示command,以显示进程对应的程序名称和路径,默认command只显示程序的短名称,而且不带参数。小写C后则显示长名。例如java程序,默认只显示java,c后则显示/data/software/jdk1.6.0_26/bin/java -Djava.util.logging.manager=XXXX等。)

  x,y     . Toggle highlights: 'x' sort field; 'y' running tasks
  z,b     . Toggle: 'z' color/mono; 'b' bold/reverse (only if 'x' or 'y')
  u       . Show specific user only
  n or #  . Set maximum tasks displayed

  k,r       Manipulate tasks: 'k' kill; 'r' renice (备注:可以kill某个进程)
  d or s    Set update interval  (备注:按d后,会提示输入数字,以更新刷新频率)

  W         Write configuration file
  q         Quit
          ( commands shown with '.' require a visible task display window )
Press 'h' or '?' for help with Windows,
any other key to continue

输出内容含义解释
第一行:等效于uptime命令的输出
top - 22:09:08 up 14 min,  1 user,  load average: 0.21, 0.23, 0.30

(1)“22:09:08”表示系统当前时间,即:当前是22点,9分,8秒;
(2)“up 14min”表示系统开机(启动操作系统)到现在运行了14分钟。一般服务器经常是运行一年都不用重启的,我服务器上的: top - 16:10:52 up 320 days, 22:42,  4 users,  load average: 0.24, 0.28, 0.30,运行了320天。
(3)“1 user”表示当前有1个用户登录系统,linux是多用户的系统,所以有多个用户同时登录很正常;
(4)“load average: 0.21, 0.23, 0.30”系统负载,是衡量CPU利用率(繁忙程度)的一个指标,但是这个东西没有严格定义,不同的操作系统可能有不同的实现。基本可以理解为:最近1分钟,5分钟,15分钟内,任务队列的平均长度。后面详细讨论:Load Average。


第二,三行:
Tasks:  81 total,   1 running,  80 sleeping,   0 stopped,   0 zombie
Cpu(s):  9.5%us, 31.2%sy,  0.0%ni, 27.0%id,  7.6%wa,  1.0%hi, 23.7%si,  0.0%st

(1)“81 total”表示总共有81个进程,意味着下面的进程列表中有81个,只不过不会全部显示。
(2)“9.5%us,31.2%sy,27.0%id,7.6%wa”分别告示:USER用户进程的CPU使用率是9.5%;System系统的是31.2%;IDLE空闲的是27.0%;WAITING FOR IO的是7.6%。

Shows CPU utilization details. “9.5%us” user processes are using 9.5%; “31.2%sy” system processes are using 31.2%; “27.0%id” percentage of available cpu; “7.6%wa” time CPU is waiting for IO.

第四,五行
Mem:    255592k total,   167568k used,    88024k free,    25068k buffers
Swap:   524280k total,        0k used,   524280k free,    85724k cached
Describes the memory usage. These numbers can be misleading. “255592k total” is total memory in the system; “167568K used” is the part of the RAM that currently contains information; “88024k free” is the part of RAM that contains no information; “25068K buffers and 85724k cached” is the buffered and cached data for IO.
So what is the actual amount of free RAM available for programs to use ?
The answer is: free + (buffers + cached)
88024k + (25068k + 85724k) = 198816k
How much RAM is being used by progams ?
The answer is: used – (buffers + cached)
167568k – (25068k + 85724k) = 56776k
常用
1、排序(都是大写字母)
M – Sort by memory usage  (按内存降序排列);
P – Sort by CPU usage (按当前消耗CPU降序排列)
T – Sort by cumulative time (TIME+列,按累计消耗CPU时间降序排列)
R- 把当前的“降序”修改成升序;

2、刷新频率
按小写的d,然后看到提示后输入秒数,比如1,则每秒刷新1次。
当然也可以在top命令的时候就输入选项参数:
- d – Controls the delay between refreshe
If we want to change the delay between refreshes to 5 seconds
$ top -d 5


3、监控指定的PID
You can control what top displays by issuing parameters when you run top.
- p – Specify the process by PID that you want to monitor
If we want to only monitor the http process with a PID of 3166
$ top -p 3166
If we want to change the delay between refreshes to 5 seconds
多个进程用逗号隔开。Top –p 3166,3177

4、非交互方式
Top命令进去后,默认是交互方式,如果我们不希望交互呢?可以做到!
-n – Update the display this number of times and then exit

$ top –n 1


附录1:Load Average 没有严格定义
REFER1:  http://www.kernelhardware.org/linux-top-command/ Load average is an extensive topic and to understand its inner workings can be daunting.
REFER2:  http://www.bsdlover.cn/html/80/n-3180.html 系统负载量化定义就是“狗屎”

Load average is an extensive topic and to understand its inner workings can be daunting. The simplest of definitions states that load average is the cpu utilization over a period of time. A load average of 1 means your cpu is being fully utilized and processes are not having to wait to use a CPU. A load average above 1 indicates that processes need to wait and your system will be less responsive. If your load average is consistently above 3 and your system is running slow you may want to upgrade to more CPU’s or a faster CPU.

“系统负载”是一个很广泛的话题(没有严格定义),如果你非要较真地去了解它的内部工作原理,你会很恼火,因为似乎你越想了解,越不能真正了解(很多操作系统的官方文档没有详细解释“系统负载”的实现机制)。简单的说,系统负载就是指一段时间内,CPU的利用率(繁忙程度)的一个指标(任务队列的平均长度)。系统负载的理想值是“1”,表示任务队列中一直有1个任务,意味着:CPU物尽其用,而且没有任务需要等待。既没有浪费,也没有等待。如果loadavg<1,则表示CPU有空闲,系统负载不高;如果loadavg>1,则表示CPU有点忙,有的任务需要等待CPU计算。但是loadavg大于1,不要紧张,你不一定需要升级更好的机器或者加CPU。但是如果持续性的高于3,那么需要加CPU或机器了。

REFER:  http://en.wikipedia.org/wiki/Load_(computing)
In UNIX computing, the system load is a measure of the amount of work that a computer system performs. The load average represents the average system load over a period of time. It conventionally appears in the form of three numbers which represent the system load during the last one-, five-, and fifteen-minute periods.

For example, one can interpret a load average of "1.73 0.50 7.98" on a single-CPU system as:
?    during the last minute, the CPU was overloaded by 73% (1 CPU with 1.73 runnable processes, so that 0.73 processes had to wait for a turn)
?    during the last 5 minutes, the CPU was underloaded 50% (no processes had to wait for a turn)
?    during the last 15 minutes, the CPU was overloaded 698% (1 CPU with 7.98 runnable processes, so that 6.98 processes had to wait for a turn)
最近15分钟,任务队列平均长度是7.98,意味着1个CPU有7.98个正在运行的任务,而1个CPU只能处理1个任务,那么6.98个任务需要等待CPU的TURN(轮训),也就是说Overloaded是698%。

注意:多CPU的情况,服务器都是多CPU的。

In a system with four CPUs, a load average of 3.73 would indicate that there were, on average, 3.73 processes ready to run, and each one could be scheduled into a CPU.



参考资料
1、    top命令逐行详解:  http://www.kernelhardware.org/linux-top-command/
2、    国人top命令详细解答:   
3、    系统帮助:top –h  或者  man top