文章目录
- 1. 原文概述
- 补充概述
- 2. 例子
- 2.1 用jstack加进程id查找死锁
- 2.2 jstack统计线程数
- 2.3 jstack检测cpu高
- 3. 实战
- 3.1 一次cpu高的实战记录
- 3.2 线程退出问题记录
1. 原文概述
Java堆栈跟踪工具
jstack
(Stack Trace for Java)命令用于生成虚拟机当前时刻的线程快照
(一般称为threaddump或者javacore文件)。
jstack命令生成的thread dump信息包含了JVM中
所有存活的线程
线程快照就是当前虚拟机内每一条线程正在执行的方法堆栈的集合,生成线程快照的目的通常是定位线程出现长时间停顿的原因,如线程间死锁、死循环
、请求外部资源导致的长时间挂起
等,都是导致线程长时间停顿的常见原因。线程出现停顿时通过jstack来查看各个线程的调用堆栈,就可以获知没有响应的线程到底在后台做些什么事情,或者等待着什么资源。
jstack命令格式:
jstack [ option ] vmid
option选项的合法值与具体含义如表4-4所示:
例:
jstack -l 3500
2010-11-19 23:11:26
Full thread dump Java HotSpot(TM) 64-Bit Server VM (17.1-b03 mixed mode):
"[ThreadPool Manager] - Idle Thread" daemon prio=6 tid=0x0000000039dd4000 nid= 0xf50 in Object.wait() [0x000000003c96f000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x0000000016bdcc60> (a org.eclipse.equinox.internal.util.impl.tpt.threadpool.Executor)
at java.lang.Object.wait(Object.java:485)
at org.eclipse.equinox.internal.util.impl.tpt.threadpool.Executor.run (Executor. java:106)
- locked <0x0000000016bdcc60> (a org.eclipse.equinox.internal.util.impl.tpt.threadpool.Executor)
Locked ownable synchronizers:
- None
补充概述
原文提示不全,现在补充完整的语法
/opt/java8/bin/jstack
Usage:
jstack [-l] <pid>
(to connect to running process) 连接活动线程
jstack -F [-m] [-l] <pid>
(to connect to a hung process) 连接阻塞线程
jstack [-m] [-l] <executable> <core>
(to connect to a core file) 连接dump的文件
jstack [-m] [-l] [server_id@]<remote server IP or hostname>
(to connect to a remote debug server) 连接远程服务器
Options:
-F to force a thread dump. Use when jstack <pid> does not respond (process is hung)
-m to print both java and native frames (mixed mode)
-l long listing. Prints additional information about locks
-h or -help to print this help message
2. 例子
2.1 用jstack加进程id查找死锁
jstack 可以查看进程信息,如果里面包含线程死锁,那么会有BLOCKED
关键字,然后在关键字周边有相关的位置信息(类名或方法)
构造一个会出现死锁的例子:
package test.my;
public class DeadLockTest {
private static Object lock1 = new Object();
private static Object lock2 = new Object();
public static void main(String[] args) {
new Thread(() -> {
synchronized (lock1) {
try {
System.out.println("thread1 begin");
Thread.sleep(5000);
} catch (InterruptedException e) {
}
synchronized (lock2) {
System.out.println("thread1 end");
}
}
}).start();
new Thread(() -> {
synchronized (lock2) {
try {
System.out.println("thread2 begin");
Thread.sleep(5000);
} catch (InterruptedException e) {
}
synchronized (lock1) {
System.out.println("thread2 end");
}
}
}).start();
System.out.println("main thread end");
}
}
执行,注意要进入正确的目录,否则报找不main方法:
[root@EMS3 bin]# pwd
/usr/jdk1.8.0_60/bin
[root@EMS3 bin]# ./java test.my.DeadLockTest
thread1 begin
main thread end
thread2 begin
查找java进程,为10060(如果不知道进程,可以参见下面cpu高的例子,间接找到相应的java进程):
[root@EMS3 ~]# ps -ef|grep test.my.DeadLockTest
root 10060 7451 0 20:32 pts/4 00:00:00 ./java test.my.DeadLockTest
root 11403 11049 0 20:32 pts/3 00:00:00 grep --color=auto test.my.DeadLockTest
[root@EMS3 ~]#
执行jstack命令:
[root@EMS3 bin]# ./jstack -l 10060
...
"Thread-1" #19 prio=5 os_prio=0 tid=0x00007f4544190000 nid=0x2771 waiting for monitor entry [0x00007f44e24bb000]
java.lang.Thread.State: BLOCKED (on object monitor) 'BLOCKED 关键词,说明阻塞了'
at test.my.DeadLockTest.lambda$1(DeadLockTest.java:30)
- waiting to lock <0x00000006c545c6a8> (a java.lang.Object)
- locked <0x00000006c545c6b8> (a java.lang.Object)
at test.my.DeadLockTest$$Lambda$2/531885035.run(Unknown Source)
at java.lang.Thread.run(Thread.java:745)
Locked ownable synchronizers:
- None
"Thread-0" #18 prio=5 os_prio=0 tid=0x00007f454418e000 nid=0x2770 waiting for monitor entry [0x00007f44e25bc000]
java.lang.Thread.State: BLOCKED (on object monitor) 'BLOCKED 关键词,说明阻塞了'
at test.my.DeadLockTest.lambda$0(DeadLockTest.java:17)
- waiting to lock <0x00000006c545c6b8> (a java.lang.Object)
- locked <0x00000006c545c6a8> (a java.lang.Object)
at test.my.DeadLockTest$$Lambda$1/791452441.run(Unknown Source)
at java.lang.Thread.run(Thread.java:745)
Locked ownable synchronizers:
- None
日志说明
“Thread-1” :线程名
prio=5 优先级=5
tid=0x000000001fa9e000 :线程id
nid=0x2d64 : 线程对应的本地线程标识nid
在打印结果的最后,有汇总信息:
Found one Java-level deadlock: '提示发现一个死锁'
=============================
"Thread-1":
waiting to lock monitor 0x00007f44d4006218 (object 0x00000006c545c6a8, a java.lang.Object),
which is held by "Thread-0"
"Thread-0":
waiting to lock monitor 0x00007f44d4003988 (object 0x00000006c545c6b8, a java.lang.Object),
which is held by "Thread-1"
Java stack information for the threads listed above:
===================================================
"Thread-1":
at test.my.DeadLockTest.lambda$1(DeadLockTest.java:30)
- waiting to lock <0x00000006c545c6a8> (a java.lang.Object)
- locked <0x00000006c545c6b8> (a java.lang.Object)
at test.my.DeadLockTest$$Lambda$2/531885035.run(Unknown Source)
at java.lang.Thread.run(Thread.java:745)
"Thread-0":
at test.my.DeadLockTest.lambda$0(DeadLockTest.java:17)
- waiting to lock <0x00000006c545c6b8> (a java.lang.Object)
- locked <0x00000006c545c6a8> (a java.lang.Object)
at test.my.DeadLockTest$$Lambda$1/791452441.run(Unknown Source)
at java.lang.Thread.run(Thread.java:745)
Found 1 deadlock.
2.2 jstack统计线程数
[root@EMS3 bin]# ./jstack -l 10060 | grep 'java.lang.Thread.State' | wc -l
20
2.3 jstack检测cpu高
写一个会复现cpu冲高的例子:
package test.my;
import com.tuling.jvm.User;
/**
* 运行此代码,cpu会飙高
*/
public class Math {
public static final int initData = 666;
public static User user = new User();
public int compute() { // 一个方法对应一块栈帧内存区域
int a = 1;
int b = 2;
int c = (a + b) * 10;
return c;
}
public static void main(String[] args) {
Math math = new Math();
while (true) {
math.compute();
}
}
}
步骤1.使用命令top
,查看cpu占用高进程
,从下面的结果可以看出22528进程占用cpu达到99%:
[root@EMS3 bin]# top
top - 22:28:18 up 54 days, 11:25, 4 users, load average: 2.64, 2.47, 2.24
Tasks: 847 total, 1 running, 846 sleeping, 0 stopped, 0 zombie
%Cpu(s): 11.0 us, 4.3 sy, 0.0 ni, 82.4 id, 0.2 wa, 0.8 hi, 0.6 si, 0.7 st, 82.2 id_exact, 0.9 hi_exact, 1.6 irq_exact
KiB Mem : 49293928 total, 401856 free, 20822364 used, 28069708 buff/cache
KiB Swap: 0 total, 0 free, 0 used. 27498692 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
22528 root 20 0 15.6g 21180 10904 S 99.0 0.0 1:06.84 java
21995 root 20 0 9870244 561992 31700 S 29.3 1.1 11935:39 java
2045 root 20 0 10.5g 1.1g 11944 S 27.6 2.4 14073:04 java
第一列PID是进程ID,如果已知目标进程,也可以使用命令top -p <pid>
,精确监控,pid是你的java进程号:
[root@EMS3 ~]# top -p 22528
步骤2.查看cpu占用高线程
通过进程ID找到内存和cpu占用最高的线程tid,此时tid
=22529
,详细步骤如下:
[root@EMS3 ~]# top -H -p 22528
注意:虽然第一列仍显示PID,但该列表示的是线程ID 即tid
步骤3:转换线程ID
此时我们看到的线程ID是10进制的,需要转为十六进制得到 5801,作用是后续搜索日志,日志的内容是16进制的:
[root@EMS3 ~]# printf "%x\n" 22529 '把10进制的线程id 22529转为为16进制的'
5801
步骤4:定位cpu占用线程,找到精确的日志
搜索16进制的线程ID,查看是否有死锁、死循环之类的:
[root@EMS3 bin]# ./jstack 22528|grep 5801 -A 30
jstack 22528是打印该进程下的所有日志,然后根据线程id (16进制)5801 进行搜索!
3. 实战
3.1 一次cpu高的实战记录
在项目中使用了encache框架,50个数据文件进行大数量的快速读写,经定位是内存空间不足,引发频繁gc导致cpu冲高,下面看下当时的截图:
从图中能看到,该进程内的多线线程运行情况,其中有多个gc线程资源占比较高,说明环境是内存不足了。
3.2 线程退出问题记录
有个功能,里面是while(true)读取kafka消息,结果后来发现不再处理消息,收不到发送的消息。
经过定位,while()循环中,调用poll读取消息后,出现空指针异常,导致while循环退出了,处理线程也挂了。
定位过程:记得前文提到jstack命令生成的thread dump信息包含了JVM中所有存活的线程
,我们用线程名去线程dump日志中搜索,发现不存在目标线程,因此断定线程肯定是发生一次退出了!
因此,也建议所有的线程都加上线程名,这边方便后期定位问题!
参考:
《jstack命令解析》带很多具体例子