基于Instrument技术实现Java类字节码插桩代理
1 背景描述
在开发中,我们往往会碰到这一类场景,即,所引用的jar包中某些类的实现方法不满足业务要求,需要按照业务需求重新修改,实现AOP增强处理。如果jar 包是由自己团队开发的,可以直接改源码重新发布然后引入使用,不需要通过修改字节码实现代理增强。但如果jar包是由第三方实现并编译发布到maven 库,则无法直接修改源码。这种只能通过增强字节码来实现。增强字节码也分两种情况,一种是所有原始类字节码已加载到内存后,通过cglib 库动态生成原始类的代理类。代理类是原始类的子类,且内部通过组合的方式集成了原始类对象,所谓增强就是在调用原始类对象方法前加入用户自定义的代码并执行。但是依赖jar 包中其他类对该原始类的引用以及原始类对象的构建代码均已编译好,cglib无法操作jvm重定义Class信息。 第二种情况则是从根上解决问题,在程序加载类前,先修改类文件字节码数组,然后加载到内存空间转换为Class 元数据,这样所有使用该类的地方都会使用修改字节码后的类元数据,这种方式比第一种更加彻底。第二种方式就需要依赖jvm Instrument机制配合字节码操作框架来实现,当然Instrument 技术也可以在加载用户类之后再转换并替换Class信息。
2 Instrument技术
Instrument“插桩”是JDK5引入的特性,允许通过代理(Agent)动态的对已加载的类进行字节码修改(增强)。Instrument底层依赖JVMTI(JVM Tool Interface)实现。JVMTI是JVM暴露的用户扩展接口,基于事件驱动,在特定处回调用户扩展的接口,
官方介绍如下:
在hotspot 虚拟机结构中JVMTI隶属于Prims 子模块,各模块由c/c++语言实现。
各模块功能说明如下:
模块 | 功能描述 |
adlc | 平台描述文件 |
c1 | C1编译器,即client编译器 |
classfile | Class文件解析和类的链接等 |
code | 机器码生成 |
compiler | 动态编译器 |
Opto | C2编译器,即server编译器 |
gc | 垃圾收集器的具体实现 |
interpreter | 解释器 |
libadt | 抽象数据结构 |
memory | 内存管理 |
oops | JVM内部对象表示 |
prims | HotSpot对外接口 |
runtime | 运行时支持,比如线程、安全点、反射、互斥锁等组件 |
services | JMX接口 |
utilizes | 内部工具类和公共函数 |
JVMTI核心函数如下:
/*
* 启动参数指定了-javaagent,则启动时会回调该函数;
*/
JNIEXPORT jint JNICALL
Agent_OnLoad(JavaVM *vm, char *options, void *reserved);
/*
* 通过Attach方式向目标进程发送load命令加载agent时,触发该函数回调;
*/
JNIEXPORT jint JNICALL
Agent_OnAttach(JavaVM* vm, char* options, void* reserved);
/*
* agent卸载时回调;
*/
JNIEXPORT void JNICALL
Agent_OnUnload(JavaVM *vm);
JVMTIAgent实现了JVMTI接口相关功能,后者实际上是一类c语言动态库文件(如:librainstrument.so)。用户可以直接编写c/c++程序调用JVMTI(JVM Tool Interface)来实现vm 内部操作,java 程序员也可以借助 JPLISAgent(Java Programming Language Instrumentation Services Agent)完成代理功能。JavaAgent的实现类为JPLISAgent.c(Java Programming Language Instrument Services Agent)JPLISAgent的构造如下,其中包含了操作的JVM、agent、instrumentation等关键对象,并实现了回调方法。
JPLISEnvironment有两个:
- mNormalEnvironment处理支持Retransform的ClassFileTransformer;
- mRetransformEnvironment处理不支持Retransform的ClassFileTransformer
JPLISAgent java 接口类 Instrumentation 把 Java 的 instrument 功能从本地代码中解放出来,使之可以用 Java 代码的方式解决问题。在 Java SE 6 之后,Instrumentation 包被赋予了更强大的功能:启动后的 instrument、本地代码(native code)instrument,以及动态改变 classpath 等等。这些改变,意味着 Java 具有了更强的动态控制、解释能力,它使得 Java 语言变得更加灵活多变。jdk1.8 中Instrumentation核心功类如下:
虚拟机中,Agent_OnLoad 执行逻辑流程如下:
- 创建 InstrumentationImpl 对象
- 监听 ClassFileLoadHook 事件
- 调用 InstrumentationImpl 的 loadClassAndCallPremain 方法,在这个方法里会去调用 javaagent 里 MANIFEST.MF 里指定的 Premain-Class 类的 premain 方法
3 实现方法
3.1 Instrumentation 及其转换类的实现
先定义代理类,在premain 方法中通过Instrumentation addTransformer方法添加ClassFileTransformer接口实现类对象。该方法会被jvm jni 回调。注意,premain 方法模板必须是public static void premain(String agentArgs, Instrumentation inst)
public class AITInstrumentationAgent {
public static void premain(String agentArgs, Instrumentation inst){
inst.addTransformer(new UdfClassFileTransformer(),true);
}
}
注册的ClassFileTransformer对象是代理的核心,其接口方法中包含用户的转换逻辑。ClassFileTransformer 接口定义如下:
package java.lang.instrument;
import java.security.ProtectionDomain;
/*
* Copyright 2003 Wily Technology, Inc.
*/
/**
* An agent provides an implementation of this interface in order
* to transform class files.
* The transformation occurs before the class is defined by the JVM.
* <P>
* Note the term <i>class file</i> is used as defined in section 3.1 of
* <cite>The Java™ Virtual Machine Specification</cite>,
* to mean a sequence
* of bytes in class file format, whether or not they reside in a file.
*
* @see java.lang.instrument.Instrumentation
* @see java.lang.instrument.Instrumentation#addTransformer
* @see java.lang.instrument.Instrumentation#removeTransformer
* @since 1.5
*/
public interface ClassFileTransformer {
/**
* The implementation of this method may transform the supplied class file and
* return a new replacement class file.
*
* <P>
* There are two kinds of transformers, determined by the <code>canRetransform</code>
* parameter of
* {@link java.lang.instrument.Instrumentation#addTransformer(ClassFileTransformer,boolean)}:
* <ul>
* <li><i>retransformation capable</i> transformers that were added with
* <code>canRetransform</code> as true
* </li>
* <li><i>retransformation incapable</i> transformers that were added with
* <code>canRetransform</code> as false or where added with
* {@link java.lang.instrument.Instrumentation#addTransformer(ClassFileTransformer)}
* </li>
* </ul>
*
* <P>
* Once a transformer has been registered with
* {@link java.lang.instrument.Instrumentation#addTransformer(ClassFileTransformer,boolean)
* addTransformer},
* the transformer will be called for every new class definition and every class redefinition.
* Retransformation capable transformers will also be called on every class retransformation.
* The request for a new class definition is made with
* {@link java.lang.ClassLoader#defineClass ClassLoader.defineClass}
* or its native equivalents.
* The request for a class redefinition is made with
* {@link java.lang.instrument.Instrumentation#redefineClasses Instrumentation.redefineClasses}
* or its native equivalents.
* The request for a class retransformation is made with
* {@link java.lang.instrument.Instrumentation#retransformClasses Instrumentation.retransformClasses}
* or its native equivalents.
* The transformer is called during the processing of the request, before the class file bytes
* have been verified or applied.
* When there are multiple transformers, transformations are composed by chaining the
* <code>transform</code> calls.
* That is, the byte array returned by one call to <code>transform</code> becomes the input
* (via the <code>classfileBuffer</code> parameter) to the next call.
*
* <P>
* Transformations are applied in the following order:
* <ul>
* <li>Retransformation incapable transformers
* </li>
* <li>Retransformation incapable native transformers
* </li>
* <li>Retransformation capable transformers
* </li>
* <li>Retransformation capable native transformers
* </li>
* </ul>
*
* <P>
* For retransformations, the retransformation incapable transformers are not
* called, instead the result of the previous transformation is reused.
* In all other cases, this method is called.
* Within each of these groupings, transformers are called in the order registered.
* Native transformers are provided by the <code>ClassFileLoadHook</code> event
* in the Java Virtual Machine Tool Interface).
*
* <P>
* The input (via the <code>classfileBuffer</code> parameter) to the first
* transformer is:
* <ul>
* <li>for new class definition,
* the bytes passed to <code>ClassLoader.defineClass</code>
* </li>
* <li>for class redefinition,
* <code>definitions.getDefinitionClassFile()</code> where
* <code>definitions</code> is the parameter to
* {@link java.lang.instrument.Instrumentation#redefineClasses
* Instrumentation.redefineClasses}
* </li>
* <li>for class retransformation,
* the bytes passed to the new class definition or, if redefined,
* the last redefinition, with all transformations made by retransformation
* incapable transformers reapplied automatically and unaltered;
* for details see
* {@link java.lang.instrument.Instrumentation#retransformClasses
* Instrumentation.retransformClasses}
* </li>
* </ul>
*
* <P>
* If the implementing method determines that no transformations are needed,
* it should return <code>null</code>.
* Otherwise, it should create a new <code>byte[]</code> array,
* copy the input <code>classfileBuffer</code> into it,
* along with all desired transformations, and return the new array.
* The input <code>classfileBuffer</code> must not be modified.
*
* <P>
* In the retransform and redefine cases,
* the transformer must support the redefinition semantics:
* if a class that the transformer changed during initial definition is later
* retransformed or redefined, the
* transformer must insure that the second class output class file is a legal
* redefinition of the first output class file.
*
* <P>
* If the transformer throws an exception (which it doesn't catch),
* subsequent transformers will still be called and the load, redefine
* or retransform will still be attempted.
* Thus, throwing an exception has the same effect as returning <code>null</code>.
* To prevent unexpected behavior when unchecked exceptions are generated
* in transformer code, a transformer can catch <code>Throwable</code>.
* If the transformer believes the <code>classFileBuffer</code> does not
* represent a validly formatted class file, it should throw
* an <code>IllegalClassFormatException</code>;
* while this has the same effect as returning null. it facilitates the
* logging or debugging of format corruptions.
*
* @param loader the defining loader of the class to be transformed,
* may be <code>null</code> if the bootstrap loader
* @param className the name of the class in the internal form of fully
* qualified class and interface names as defined in
* <i>The Java Virtual Machine Specification</i>.
* For example, <code>"java/util/List"</code>.
* @param classBeingRedefined if this is triggered by a redefine or retransform,
* the class being redefined or retransformed;
* if this is a class load, <code>null</code>
* @param protectionDomain the protection domain of the class being defined or redefined
* @param classfileBuffer the input byte buffer in class file format - must not be modified
*
* @throws IllegalClassFormatException if the input does not represent a well-formed class file
* @return a well-formed class file buffer (the result of the transform),
or <code>null</code> if no transform is performed.
* @see Instrumentation#redefineClasses
*/
byte[]
transform( ClassLoader loader,
String className,
Class<?> classBeingRedefined,
ProtectionDomain protectionDomain,
byte[] classfileBuffer)
throws IllegalClassFormatException;
}
在transform 方法体中实现类字节码转换。transform接受原始类字节码数组classfileBuffer,对该字节码数组进行AOP 处理,本案例中通过ASM 框架操作字节码数组修改对应方法的code 属性。
package com.hikvision.ait.agent;
import com.hikvision.ait.agent.classvisit.SituationEntityClassVisitor;
import org.objectweb.asm.ClassReader;
import org.objectweb.asm.ClassVisitor;
import org.objectweb.asm.ClassWriter;
import org.objectweb.asm.MethodVisitor;
import java.lang.instrument.ClassFileTransformer;
import java.lang.instrument.IllegalClassFormatException;
import java.lang.instrument.UnmodifiableClassException;
import java.security.ProtectionDomain;
public class UdfClassFileTransformer implements ClassFileTransformer {
private MethodVisitor mv;
public UdfClassFileTransformer(MethodVisitor mv){
this.mv=mv;
}
public UdfClassFileTransformer(){
}
@Override
public byte[] transform(ClassLoader loader, String className, Class<?> classBeingRedefined, ProtectionDomain protectionDomain, byte[] classfileBuffer) throws IllegalClassFormatException {
//com.hikvision.rd.bigdata.datamining.tde.situation.common.tools.TimeFormatUtils
if (className.equals("com/hikvision/rd/bigdata/datamining/tde/situation/common/tools/TimeFormatUtils")) {
System.out.println("AITInstrumentationAgent premain startup");
ClassReader cr = new ClassReader(classfileBuffer);
ClassWriter cw = new ClassWriter(cr,ClassWriter.COMPUTE_FRAMES);
ClassVisitor cv = new SituationEntityClassVisitor(cw);
cr.accept(cv,ClassReader.SKIP_FRAMES | ClassReader.SKIP_DEBUG);
System.out.println("Reloading: " + className);
return cw.toByteArray();
}
return classfileBuffer;
}
}
继承ClassWriter 抽象类,重写其visitMethod 方法,通过 super.visitMethod获取到想要修改的方法MethodVisitor 对象。
package com.hikvision.ait.agent.classvisit;
import org.objectweb.asm.*;
import java.util.Objects;
import static org.objectweb.asm.Opcodes.ASM9;
public class SituationEntityClassVisitor extends ClassVisitor {
public static final String PARSE_METHOD_DESCRIPTON="(Ljava/lang/String;Ljava/lang/String;)J";
public SituationEntityClassVisitor(ClassVisitor cv){
super(ASM9,cv);
}
@Override
public MethodVisitor visitMethod(int access, String name, String descriptor, String signature, String[] exceptions) {
if(name.contains("parse")&&descriptor.equals(PARSE_METHOD_DESCRIPTON)){
MethodVisitor interceptmethod= super.visitMethod(access, name, descriptor, signature, exceptions);
if(Objects.nonNull(interceptmethod)){
return new InterceptMethodAdapter(interceptmethod);
}
}
return super.visitMethod(access, name, descriptor, signature, exceptions);
}
}
很明显,这里使用的是访问者模式,我们实现MethodVisitor抽象类,将上一步返回的MethodVisitor 对象传入到实现类中,在实现类中完成方法的改造。我们目的是改造com/hikvision/rd/bigdata/datamining/tde/situation/common/tools/TimeFormatUtils
工具类的public static long parse(String timeStr, String timeFormat)
方法,原有方法实现逻辑如下:
public static long parse(String timeStr, String timeFormat) {
DateTimeFormatter dateTimeFormatter = DateTimeFormatter.ofPattern(timeFormat);
LocalDateTime localDateTime = LocalDateTime.parse(timeStr, dateTimeFormatter);
return localDateTime.atZone(ZoneId.systemDefault()).toInstant().toEpochMilli();
}
我们想改成按照时间字符串自带的时区来解析时间,不想使用ZoneId.systemDefault() 方式获取系统默认时区,因为在容器化环境中有可能没设置好时区导致解析的时间戳不对。我们想改成以下代码:
public static long parse(String timeStr, String timeFormat) {
try {
return ZonedDateTime.parse(timeStr).toInstant().toEpochMilli();
} catch (Exception e) {
e.printStackTrace();
return 0;
}
}
如果对字节码熟悉的话,可知上述代码对应的核心字节码信息如下,其他的比如 LocalVariableTable、StackMapTable等信息我们可以不用关注,通过字节码ASM操作框架可以自动计算stack map frames。
0: aload_0
1: invokestatic #17 // Method java/time/ZonedDateTime.parse:(Ljava/lang/CharSequence;)Ljava/time/ZonedDateTime;
4: invokevirtual #14 // Method java/time/ZonedDateTime.toInstant:()Ljava/time/Instant;
7: invokevirtual #15 // Method java/time/Instant.toEpochMilli:()J
10: lreturn
11: astore_2
12: aload_2
13: invokevirtual #29 // Method java/lang/Exception.printStackTrace:()V
16: lconst_0
17: lreturn
Exception table:
from to target type
0 10 11 Class java/lang/Exception
最终字节码操作代码如下。
package com.hikvision.ait.agent.classvisit;
import org.objectweb.asm.Label;
import org.objectweb.asm.Opcodes;
import org.objectweb.asm.MethodVisitor;
import static org.objectweb.asm.Opcodes.ASM9;
public class InterceptMethodAdapter extends MethodVisitor {
public InterceptMethodAdapter(MethodVisitor mv){
super(ASM9, mv);
}
@Override
public void visitCode() {
// AOP,注入自己的代码逻辑,嵌入字节码指令
Label startLabel=new Label();
mv.visitLabel(startLabel); // try catch from 打标签
mv.visitVarInsn(Opcodes.ALOAD,0);
mv.visitMethodInsn(Opcodes.INVOKESTATIC,"java/time/ZonedDateTime","parse","(Ljava/lang/CharSequence;)Ljava/time/ZonedDateTime;",false);
mv.visitMethodInsn(Opcodes.INVOKEVIRTUAL, "java/time/ZonedDateTime", "toInstant", "()Ljava/time/Instant;", false);
mv.visitMethodInsn(Opcodes.INVOKEVIRTUAL, "java/time/Instant", "toEpochMilli", "()J", false);
Label toLabel=new Label();
mv.visitLabel(toLabel); // try catch to 打标签
mv.visitInsn(Opcodes.LRETURN);
Label targetLabel=new Label();
mv.visitLabel(targetLabel); // try catch target 打标签
mv.visitVarInsn(Opcodes.ASTORE,2);
mv.visitVarInsn(Opcodes.ALOAD,2);
mv.visitMethodInsn(Opcodes.INVOKEVIRTUAL, "java/lang/Exception", "printStackTrace", "()V", false);
mv.visitLdcInsn(Long.valueOf(0));
mv.visitInsn(Opcodes.LRETURN);
mv.visitTryCatchBlock(startLabel,toLabel,targetLabel,"java/lang/Exception");
// 原有方法体代码,执行完自定义逻辑后,方法体直接返回了,
super.visitCode();
// super.visitEnd();
}
}
3.2 maven 配置
最后,贴上maven工程的还有些pom 配置。
<dependencies>
<dependency>
<groupId>org.ow2.asm</groupId>
<artifactId>asm</artifactId>
<version>9.0</version>
</dependency>
<!-- https://mvnrepository.com/artifact/junit/junit -->
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>4.12</version>
<scope>test</scope>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.8.0</version>
<configuration>
<source>1.8</source>
<target>1.8</target>
</configuration>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-jar-plugin</artifactId>
<version>3.1.0</version>
<configuration>
<archive>
<manifestEntries>
<Premain-Class>com.hikvision.ait.agent.AITInstrumentationAgent</Premain-Class>
<Can-Redefine-Classes>true</Can-Redefine-Classes>
<Can-Retransform-Classes>true</Can-Retransform-Classes>
</manifestEntries>
</archive>
</configuration>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<configuration>
<finalName>instrument-agent-jar-with-dependencies</finalName>
<keepDependenciesWithProvidedScope>true</keepDependenciesWithProvidedScope>
<createDependencyReducedPom>true</createDependencyReducedPom>
<filters>
<filter>
<artifact>*:*</artifact>
<excludes>
<exclude>META-INF/*.SF</exclude>
<exclude>META-INF/*.DSA</exclude>
<exclude>META-INF/*.RSA</exclude>
</excludes>
</filter>
</filters>
<artifactSet>
<excludes>
<exclude>com.google.code.findbugs:jsr305</exclude>
<!-- <exclude>org.slf4j:*</exclude>-->
<exclude>log4j:*</exclude>
</excludes>
</artifactSet>
</configuration>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>shade</goal>
</goals>
</execution>
</executions>
</plugin>
</plugins>
</build>
修改字节码需要使用字节码操作框架,依赖中引入asm。另外需要在 jar包META-INF/MAINIFEST.MF属性当中加入” Premain-Class”信息,可以在maven 工程打包插件中配置(maven-jar-plugin 的manifestEntries 属性中设置)。在maven package命令打包后即可自动写入MAINIFEST.MF当中。shade插件可以将所有依赖包一起打包。打包后MAINIFEST.MF 如下,
Manifest-Version: 1.0
Premain-Class: com.hikvision.ait.agent.AITInstrumentationAgent
Built-By: wangwen
Can-Redefine-Classes: true
Can-Retransform-Classes: true
Created-By: Apache Maven 3.6.0
Build-Jdk: 1.8.0_181
4 测试javaagent
我们在idea 中测试javaagent ,在程序运行j启动脚本中增加-javaagent:D:\AIT-CODE\instrumentation-agent\target\instrumentation-agent-1.0-SNAPSHOT.jar
。
instrumentation-agent-1.0-SNAPSHOT.jar 上一步构建好的代理类jar 包。 如果在服务器端运行,在java -jar xxx.jar命令前面添加-javaagent:D:\AIT-CODE\instrumentation-agent\target\instrumentation-agent-1.0-SNAPSHOT.jar
即可。形式如下
java -javaagent:D:\AIT-CODE\instrumentation-agent\target\instrumentation-agent-1.0-SNAPSHOT.jar -jar 业务代码包.jar
public static void main(String [] args){
long ti=TimeFormatUtils.parse("2021-08-02T10:40:15.403+09:00",null);
long before=parse("2021-08-02T10:40:15.403+09:00","yyyy-MM-dd'T'HH:mm:ss.SSS+09:00");
System.out.println();
System.out.print("改造 parse 函数前,按照系统默认的Asia/Shanghai时区转换:");
System.out.println(before);
System.out.println("执行函数AOP 代理");
System.out.print("改造 parse 函数后,按照Asia/Tokyo 时区转换:");
System.out.println(ti);
}
public static long parse(String timeStr, String timeFormat) {
// 以下代码是TimeFormatUtils.parse 方法的原始实现逻辑,由于添加了代理转换,TimeFormatUtils.parse 不在执行以下代码,所以拷贝出来放在测试类中,作对比测试。
DateTimeFormatter dateTimeFormatter = DateTimeFormatter.ofPattern(timeFormat);
LocalDateTime localDateTime = LocalDateTime.parse(timeStr, dateTimeFormatter);
return localDateTime.atZone(ZoneId.systemDefault()).toInstant().toEpochMilli();
}
运行测试结果如下:
AITInstrumentationAgent premain startup
Reloading: com/hikvision/rd/bigdata/datamining/tde/situation/common/tools/TimeFormatUtils
改造 parse 函数前,按照系统默认的Asia/Shanghai时区转换:1627872015403
执行函数AOP 代理
改造 parse 函数后,按照Asia/Tokyo 时区转换:1627868415403
改造后的字节码也可以写入磁盘,打开写入磁盘的TimeFormatUtils.class字节码文件,查看对应的方法体,改造后的类方法如下,说明达到了类字节码插桩代理的目的。
public static long parse(String var0, String var1) {
try {
return ZonedDateTime.parse(var0).toInstant().toEpochMilli();
} catch (Exception var4) {
var4.printStackTrace();
return 0L;
}
}
4 结语
本文简要描述了如何通过Instrument技术实现虚拟机Agent_OnLoad模式修改java 类字节码插装代理。该方法使得开发者可以构建一个独立于应用程序的代理程序(Agent),用来监测和协助运行在 JVM 上的程序,甚至能够替换和修改某些类的定义。有了这样的功能,开发者就可以实现更为灵活的运行时虚拟机监控和 Java 类操作了,这样的特性实际上提供了 一种虚拟机级别支持的 AOP 实现方式,使得开发者无需对 JDK 做任何升级和改动,就可以实现某些 AOP 的功能。
除了在程序加载前完成Agent 代理操作,也可以在java 程序运行中动态修改。后续将写一篇新的技术博客将介绍这种方法的。