分享下flink源码学习
1、命令行执行shell脚本
flink -h
调用的shell脚本位于flink bin目录下的flink脚本执行。
去到脚本最后一行,可以看到实际是启用了一个java程序
# Add HADOOP_CLASSPATH to allow the usage of Hadoop file systems
exec "${JAVA_RUN}" $JVM_ARGS $FLINK_ENV_JAVA_OPTS "${log_setting[@]}" -classpath "`manglePathList "$CC_CLASSPATH:$INTERNAL_HADOOP_CLASSPATHS"`" org.apache.flink.client.cli.CliFrontend "$@"
可以看到shell脚本实际调用的是org.apache.flink.client.cli.CliFrontend这个类的main方式执行程序。
2、CliFrontend的main方法调用
//org/apache/flink/client/cli/CliFrontend.java:1147
public static void main(final String[] args) {
EnvironmentInformation.logEnvironmentInfo(LOG, "Command Line Client", args);
// 1. find the configuration directory
final String configurationDirectory = getConfigurationDirectoryFromEnv();
// 2. load the global configuration
final Configuration configuration =
GlobalConfiguration.loadConfiguration(configurationDirectory);
// 3. load the custom command lines
final List<CustomCommandLine> customCommandLines =
loadCustomCommandLines(configuration, configurationDirectory);
int retCode = 31;
try {
final CliFrontend cli = new CliFrontend(configuration, customCommandLines);
SecurityUtils.install(new SecurityConfiguration(cli.configuration));
retCode = SecurityUtils.getInstalledContext().runSecured(() -> cli.parseAndRun(args));
} catch (Throwable t) {
final Throwable strippedThrowable =
ExceptionUtils.stripException(t, UndeclaredThrowableException.class);
LOG.error("Fatal error while running command line interface.", strippedThrowable);
strippedThrowable.printStackTrace();
} finally {
System.exit(retCode);
}
}
总结其流程:
1、依次通过以下3种方式查找flink配置文件夹,如果都没找到则报错:
- 通过环境变量 FLINK_CONF_DIR 查找flink的配置文件夹
- 查找前面flink shell脚本执行目录的 ../conf 目录
- 查找前面flink shell脚本执行目录下的conf目录
2、读取上面配置文件目录下的flink-conf.yaml文件,解析配置并封装成Configuration对象。
3、按顺序加载3个命令行客户端,分别是GenericCLI、FlinkYarnSessionCli、DefaultCLI。flink主要是通过3个客户端按顺序来判断应该执行什么模式,如standalone、yarn-session等模式。
4、把上述加载的配置和命令行客户端封装到CliFrontend对象中。
5、SecurityUtils根据配置文件安装安全模块和安全执行环境。
6、调用安装的环境对象,开启一个新的线程来执行CliFrontend的parseAndRun方法。
3、CliFrontend的parseAndRun方法
这个方法主要分析命令行参数并启动请求的操作。就是解析命令参数flink -t xxx -c xxx这些用户传入的命令行参数,并进行相关操作。
//org/apache/flink/client/cli/CliFrontend.java:1069
public int parseAndRun(String[] args) {
// check for action
if (args.length < 1) {
CliFrontendParser.printHelp(customCommandLines);
System.out.println("Please specify an action.");
return 1;
}
// get action
String action = args[0];
// remove action from parameters
final String[] params = Arrays.copyOfRange(args, 1, args.length);
try {
// do action
switch (action) {
case ACTION_RUN:
run(params);
return 0;
case ACTION_RUN_APPLICATION:
runApplication(params);
return 0;
case ACTION_LIST:
list(params);
return 0;
case ACTION_INFO:
info(params);
return 0;
case ACTION_CANCEL:
cancel(params);
return 0;
case ACTION_STOP:
stop(params);
return 0;
case ACTION_SAVEPOINT:
savepoint(params);
return 0;
case "-h":
case "--help":
CliFrontendParser.printHelp(customCommandLines);
return 0;
case "-v":
case "--version":
String version = EnvironmentInformation.getVersion();
String commitID = EnvironmentInformation.getRevisionInformation().commitId;
System.out.print("Version: " + version);
System.out.println(
commitID.equals(EnvironmentInformation.UNKNOWN)
? ""
: ", Commit ID: " + commitID);
return 0;
default:
System.out.printf("\"%s\" is not a valid action.\n", action);
System.out.println();
System.out.println(
"Valid actions are \"run\", \"run-application\", \"list\", \"info\", \"savepoint\", \"stop\", or \"cancel\".");
System.out.println();
System.out.println(
"Specify the version option (-v or --version) to print Flink version.");
System.out.println();
System.out.println(
"Specify the help option (-h or --help) to get help on the command.");
return 1;
}
} catch (CliArgsException ce) {
return handleArgException(ce);
} catch (ProgramParametrizationException ppe) {
return handleParametrizationException(ppe);
} catch (ProgramMissingJobException pmje) {
return handleMissingJobException();
} catch (Exception e) {
return handleError(e);
}
}
总结其流程:
- 拿到用户传入的第一个参数作为action,通过switch匹配不同的值,执行不同的方法。action的操作如下:
//org.apache.flink.client.cli.CliFrontend:92
// actions
private static final String ACTION_RUN = "run";
private static final String ACTION_RUN_APPLICATION = "run-application";
private static final String ACTION_INFO = "info";
private static final String ACTION_LIST = "list";
private static final String ACTION_CANCEL = "cancel";
private static final String ACTION_STOP = "stop";
private static final String ACTION_SAVEPOINT = "savepoint";
- 这里以run操作来分析,进入CliFrontend的run方法。下面总结run方法主要逻辑。
- 获取flink框架中定义好的run方法命令行参数的Options对象,这里flink解析用户命令行参数使用的是apache的commons-cli包。
- 根据上一步的Options对象,将用户传入的命令行参数(就是-c -t等操作后面的值)封装成CommandLine对象。
- 先检查一下用户有没有传入 -h 或--help操作,如果有直接打印help操作文档。
- 获取活动状态的命令行客户端,就是前面封装到CliFrontend的GenericCLI、FlinkYarnSessionCli、DefaultCLI 三个客户端。按顺序判断那个是活跃,谁活跃就使用谁,然后跳出判断,返回结果。下面介绍其判断逻辑。
- GenericCLI:存在execution.target、-e 、--executor、-t、--target这几个配置或参数,且值不为null,则使用GenericCLI。
- FlinkYarnSessionCli:-m --jobmanager的值等于yarn-cluster 或 参数中传入的yarn applicationId值存在 或 execution.target的值为yarn-session或yarn-pre-job
- DefaultCLI:默认返回true,standalone模式使用
- 创建用户代码工程的相关参数封装ProgramOptions,这里判断是jar包任务还是python任务,然后进行运行的参数提取和封装。主要封装:用户jar包路径、用户jar包相关依赖路径、用户代码main方法类路径,用户代码main方法args传入参数、指定的并行度、detache模式、savepoint相关配置。
- 根据上述ProgramOptions提取用户jar包和所有依赖的路径到List中。
- 再次根据上诉配置和jar包路径,分钟层有效配置。
- 根据有效配置和ProgramOptions对用户项目进行构建,封装成PackageProgram对象。
- 调用ClinentUtils的executeProgram方法开始做执行用户的项目代码准备。
//org/apache/flink/client/ClientUtils.java:66
public static void executeProgram(
PipelineExecutorServiceLoader executorServiceLoader,
Configuration configuration,
PackagedProgram program,
boolean enforceSingleJobExecution,
boolean suppressSysout)
throws ProgramInvocationException {
checkNotNull(executorServiceLoader);
final ClassLoader userCodeClassLoader = program.getUserCodeClassLoader();
final ClassLoader contextClassLoader = Thread.currentThread().getContextClassLoader();
try {
Thread.currentThread().setContextClassLoader(userCodeClassLoader);
LOG.info(
"Starting program (detached: {})",
!configuration.getBoolean(DeploymentOptions.ATTACHED));
ContextEnvironment.setAsContext(
executorServiceLoader,
configuration,
userCodeClassLoader,
enforceSingleJobExecution,
suppressSysout);
StreamContextEnvironment.setAsContext(
executorServiceLoader,
configuration,
userCodeClassLoader,
enforceSingleJobExecution,
suppressSysout);
try {
program.invokeInteractiveModeForExecution();
} finally {
ContextEnvironment.unsetAsContext();
StreamContextEnvironment.unsetAsContext();
}
} finally {
Thread.currentThread().setContextClassLoader(contextClassLoader);
}
}
总结具体准备流程如下:
- 获取用户代码的类加载器,将当前线程的ClassLoader设置为用户项目的ClassLoader。
- 设置客户端进行远程调用的执行环境
- 设置客户端执行的流式环境,也就是我们写flink代码StreamExecutionEnvironment.getExecutionEnvironment()获取的执行环境,在这里进行了设置。
- 接下来调用PackagedProgramd的invokeInteractiveModeForExecution方法,开始执行用户的main方法。
//org/apache/flink/client/program/PackagedProgram.java:323
private static void callMainMethod(Class<?> entryClass, String[] args)
throws ProgramInvocationException {
Method mainMethod;
if (!Modifier.isPublic(entryClass.getModifiers())) {
throw new ProgramInvocationException(
"The class " + entryClass.getName() + " must be public.");
}
try {
mainMethod = entryClass.getMethod("main", String[].class);
} catch (NoSuchMethodException e) {
throw new ProgramInvocationException(
"The class " + entryClass.getName() + " has no main(String[]) method.");
} catch (Throwable t) {
throw new ProgramInvocationException(
"Could not look up the main(String[]) method from the class "
+ entryClass.getName()
+ ": "
+ t.getMessage(),
t);
}
if (!Modifier.isStatic(mainMethod.getModifiers())) {
throw new ProgramInvocationException(
"The class " + entryClass.getName() + " declares a non-static main method.");
}
if (!Modifier.isPublic(mainMethod.getModifiers())) {
throw new ProgramInvocationException(
"The class " + entryClass.getName() + " declares a non-public main method.");
}
try {
mainMethod.invoke(null, (Object) args);
} catch (IllegalArgumentException e) {
throw new ProgramInvocationException(
"Could not invoke the main method, arguments are not matching.", e);
} catch (IllegalAccessException e) {
throw new ProgramInvocationException(
"Access to the main method was denied: " + e.getMessage(), e);
} catch (InvocationTargetException e) {
Throwable exceptionInMethod = e.getTargetException();
if (exceptionInMethod instanceof Error) {
throw (Error) exceptionInMethod;
} else if (exceptionInMethod instanceof ProgramParametrizationException) {
throw (ProgramParametrizationException) exceptionInMethod;
} else if (exceptionInMethod instanceof ProgramInvocationException) {
throw (ProgramInvocationException) exceptionInMethod;
} else {
throw new ProgramInvocationException(
"The main method caused an error: " + exceptionInMethod.getMessage(),
exceptionInMethod);
}
} catch (Throwable t) {
throw new ProgramInvocationException(
"An error occurred while invoking the program's main method: " + t.getMessage(),
t);
}
}
这里主要进行了一些方法检查,然后通过mainMethod.invoke(null, (Object) args)调用用户的代码,开始执行用户项目的代码了。最后通过调用StreamExecutionEnvironment.execute()方法才实际执行flink整体集群的构建和任务的提交运行。下一节将详细介绍。