分享下flink源码学习

1、命令行执行shell脚本

flink -h

调用的shell脚本位于flink bin目录下的flink脚本执行。

去到脚本最后一行,可以看到实际是启用了一个java程序

# Add HADOOP_CLASSPATH to allow the usage of Hadoop file systems
exec "${JAVA_RUN}" $JVM_ARGS $FLINK_ENV_JAVA_OPTS "${log_setting[@]}" -classpath "`manglePathList "$CC_CLASSPATH:$INTERNAL_HADOOP_CLASSPATHS"`"     org.apache.flink.client.cli.CliFrontend "$@"

可以看到shell脚本实际调用的是org.apache.flink.client.cli.CliFrontend这个类的main方式执行程序。

2、CliFrontend的main方法调用

//org/apache/flink/client/cli/CliFrontend.java:1147
public static void main(final String[] args) {
        EnvironmentInformation.logEnvironmentInfo(LOG, "Command Line Client", args);

        // 1. find the configuration directory
        final String configurationDirectory = getConfigurationDirectoryFromEnv();

        // 2. load the global configuration
        final Configuration configuration =
                GlobalConfiguration.loadConfiguration(configurationDirectory);

        // 3. load the custom command lines
        final List<CustomCommandLine> customCommandLines =
                loadCustomCommandLines(configuration, configurationDirectory);

        int retCode = 31;
        try {
            final CliFrontend cli = new CliFrontend(configuration, customCommandLines);

            SecurityUtils.install(new SecurityConfiguration(cli.configuration));
            retCode = SecurityUtils.getInstalledContext().runSecured(() -> cli.parseAndRun(args));
        } catch (Throwable t) {
            final Throwable strippedThrowable =
                    ExceptionUtils.stripException(t, UndeclaredThrowableException.class);
            LOG.error("Fatal error while running command line interface.", strippedThrowable);
            strippedThrowable.printStackTrace();
        } finally {
            System.exit(retCode);
        }
    }

总结其流程:

1、依次通过以下3种方式查找flink配置文件夹,如果都没找到则报错:

  • 通过环境变量 FLINK_CONF_DIR 查找flink的配置文件夹
  • 查找前面flink shell脚本执行目录的 ../conf 目录
  • 查找前面flink shell脚本执行目录下的conf目录

2、读取上面配置文件目录下的flink-conf.yaml文件,解析配置并封装成Configuration对象。

3、按顺序加载3个命令行客户端,分别是GenericCLI、FlinkYarnSessionCli、DefaultCLI。flink主要是通过3个客户端按顺序来判断应该执行什么模式,如standalone、yarn-session等模式。

4、把上述加载的配置和命令行客户端封装到CliFrontend对象中。

5、SecurityUtils根据配置文件安装安全模块和安全执行环境。

6、调用安装的环境对象,开启一个新的线程来执行CliFrontend的parseAndRun方法。

3、CliFrontend的parseAndRun方法

这个方法主要分析命令行参数并启动请求的操作。就是解析命令参数flink -t xxx -c xxx这些用户传入的命令行参数,并进行相关操作。

//org/apache/flink/client/cli/CliFrontend.java:1069
public int parseAndRun(String[] args) {

        // check for action
        if (args.length < 1) {
            CliFrontendParser.printHelp(customCommandLines);
            System.out.println("Please specify an action.");
            return 1;
        }

        // get action
        String action = args[0];

        // remove action from parameters
        final String[] params = Arrays.copyOfRange(args, 1, args.length);

        try {
            // do action
            switch (action) {
                case ACTION_RUN:
                    run(params);
                    return 0;
                case ACTION_RUN_APPLICATION:
                    runApplication(params);
                    return 0;
                case ACTION_LIST:
                    list(params);
                    return 0;
                case ACTION_INFO:
                    info(params);
                    return 0;
                case ACTION_CANCEL:
                    cancel(params);
                    return 0;
                case ACTION_STOP:
                    stop(params);
                    return 0;
                case ACTION_SAVEPOINT:
                    savepoint(params);
                    return 0;
                case "-h":
                case "--help":
                    CliFrontendParser.printHelp(customCommandLines);
                    return 0;
                case "-v":
                case "--version":
                    String version = EnvironmentInformation.getVersion();
                    String commitID = EnvironmentInformation.getRevisionInformation().commitId;
                    System.out.print("Version: " + version);
                    System.out.println(
                            commitID.equals(EnvironmentInformation.UNKNOWN)
                                    ? ""
                                    : ", Commit ID: " + commitID);
                    return 0;
                default:
                    System.out.printf("\"%s\" is not a valid action.\n", action);
                    System.out.println();
                    System.out.println(
                            "Valid actions are \"run\", \"run-application\", \"list\", \"info\", \"savepoint\", \"stop\", or \"cancel\".");
                    System.out.println();
                    System.out.println(
                            "Specify the version option (-v or --version) to print Flink version.");
                    System.out.println();
                    System.out.println(
                            "Specify the help option (-h or --help) to get help on the command.");
                    return 1;
            }
        } catch (CliArgsException ce) {
            return handleArgException(ce);
        } catch (ProgramParametrizationException ppe) {
            return handleParametrizationException(ppe);
        } catch (ProgramMissingJobException pmje) {
            return handleMissingJobException();
        } catch (Exception e) {
            return handleError(e);
        }
    }

总结其流程:

  • 拿到用户传入的第一个参数作为action,通过switch匹配不同的值,执行不同的方法。action的操作如下:
//org.apache.flink.client.cli.CliFrontend:92
    // actions
    private static final String ACTION_RUN = "run";
    private static final String ACTION_RUN_APPLICATION = "run-application";
    private static final String ACTION_INFO = "info";
    private static final String ACTION_LIST = "list";
    private static final String ACTION_CANCEL = "cancel";
    private static final String ACTION_STOP = "stop";
    private static final String ACTION_SAVEPOINT = "savepoint";
  • 这里以run操作来分析,进入CliFrontend的run方法。下面总结run方法主要逻辑。
  • 获取flink框架中定义好的run方法命令行参数的Options对象,这里flink解析用户命令行参数使用的是apache的commons-cli包。
  • 根据上一步的Options对象,将用户传入的命令行参数(就是-c -t等操作后面的值)封装成CommandLine对象。
  • 先检查一下用户有没有传入 -h 或--help操作,如果有直接打印help操作文档。
  • 获取活动状态的命令行客户端,就是前面封装到CliFrontend的GenericCLI、FlinkYarnSessionCli、DefaultCLI 三个客户端。按顺序判断那个是活跃,谁活跃就使用谁,然后跳出判断,返回结果。下面介绍其判断逻辑。
  • GenericCLI:存在execution.target、-e 、--executor、-t、--target这几个配置或参数,且值不为null,则使用GenericCLI。
  • FlinkYarnSessionCli:-m --jobmanager的值等于yarn-cluster 或 参数中传入的yarn applicationId值存在 或 execution.target的值为yarn-session或yarn-pre-job
  • DefaultCLI:默认返回true,standalone模式使用
  • 创建用户代码工程的相关参数封装ProgramOptions,这里判断是jar包任务还是python任务,然后进行运行的参数提取和封装。主要封装:用户jar包路径、用户jar包相关依赖路径、用户代码main方法类路径,用户代码main方法args传入参数、指定的并行度、detache模式、savepoint相关配置。
  • 根据上述ProgramOptions提取用户jar包和所有依赖的路径到List中。
  • 再次根据上诉配置和jar包路径,分钟层有效配置。
  • 根据有效配置和ProgramOptions对用户项目进行构建,封装成PackageProgram对象。
  • 调用ClinentUtils的executeProgram方法开始做执行用户的项目代码准备。
//org/apache/flink/client/ClientUtils.java:66
public static void executeProgram(
            PipelineExecutorServiceLoader executorServiceLoader,
            Configuration configuration,
            PackagedProgram program,
            boolean enforceSingleJobExecution,
            boolean suppressSysout)
            throws ProgramInvocationException {
        checkNotNull(executorServiceLoader);
        final ClassLoader userCodeClassLoader = program.getUserCodeClassLoader();
        final ClassLoader contextClassLoader = Thread.currentThread().getContextClassLoader();
        try {
            Thread.currentThread().setContextClassLoader(userCodeClassLoader);

            LOG.info(
                    "Starting program (detached: {})",
                    !configuration.getBoolean(DeploymentOptions.ATTACHED));

            ContextEnvironment.setAsContext(
                    executorServiceLoader,
                    configuration,
                    userCodeClassLoader,
                    enforceSingleJobExecution,
                    suppressSysout);

            StreamContextEnvironment.setAsContext(
                    executorServiceLoader,
                    configuration,
                    userCodeClassLoader,
                    enforceSingleJobExecution,
                    suppressSysout);

            try {
                program.invokeInteractiveModeForExecution();
            } finally {
                ContextEnvironment.unsetAsContext();
                StreamContextEnvironment.unsetAsContext();
            }
        } finally {
            Thread.currentThread().setContextClassLoader(contextClassLoader);
        }
    }

总结具体准备流程如下:

  • 获取用户代码的类加载器,将当前线程的ClassLoader设置为用户项目的ClassLoader。
  • 设置客户端进行远程调用的执行环境
  • 设置客户端执行的流式环境,也就是我们写flink代码StreamExecutionEnvironment.getExecutionEnvironment()获取的执行环境,在这里进行了设置。
  • 接下来调用PackagedProgramd的invokeInteractiveModeForExecution方法,开始执行用户的main方法。
//org/apache/flink/client/program/PackagedProgram.java:323
    private static void callMainMethod(Class<?> entryClass, String[] args)
            throws ProgramInvocationException {
        Method mainMethod;
        if (!Modifier.isPublic(entryClass.getModifiers())) {
            throw new ProgramInvocationException(
                    "The class " + entryClass.getName() + " must be public.");
        }

        try {
            mainMethod = entryClass.getMethod("main", String[].class);
        } catch (NoSuchMethodException e) {
            throw new ProgramInvocationException(
                    "The class " + entryClass.getName() + " has no main(String[]) method.");
        } catch (Throwable t) {
            throw new ProgramInvocationException(
                    "Could not look up the main(String[]) method from the class "
                            + entryClass.getName()
                            + ": "
                            + t.getMessage(),
                    t);
        }

        if (!Modifier.isStatic(mainMethod.getModifiers())) {
            throw new ProgramInvocationException(
                    "The class " + entryClass.getName() + " declares a non-static main method.");
        }
        if (!Modifier.isPublic(mainMethod.getModifiers())) {
            throw new ProgramInvocationException(
                    "The class " + entryClass.getName() + " declares a non-public main method.");
        }

        try {
            mainMethod.invoke(null, (Object) args);
        } catch (IllegalArgumentException e) {
            throw new ProgramInvocationException(
                    "Could not invoke the main method, arguments are not matching.", e);
        } catch (IllegalAccessException e) {
            throw new ProgramInvocationException(
                    "Access to the main method was denied: " + e.getMessage(), e);
        } catch (InvocationTargetException e) {
            Throwable exceptionInMethod = e.getTargetException();
            if (exceptionInMethod instanceof Error) {
                throw (Error) exceptionInMethod;
            } else if (exceptionInMethod instanceof ProgramParametrizationException) {
                throw (ProgramParametrizationException) exceptionInMethod;
            } else if (exceptionInMethod instanceof ProgramInvocationException) {
                throw (ProgramInvocationException) exceptionInMethod;
            } else {
                throw new ProgramInvocationException(
                        "The main method caused an error: " + exceptionInMethod.getMessage(),
                        exceptionInMethod);
            }
        } catch (Throwable t) {
            throw new ProgramInvocationException(
                    "An error occurred while invoking the program's main method: " + t.getMessage(),
                    t);
        }
    }

这里主要进行了一些方法检查,然后通过mainMethod.invoke(null, (Object) args)调用用户的代码,开始执行用户项目的代码了。最后通过调用StreamExecutionEnvironment.execute()方法才实际执行flink整体集群的构建和任务的提交运行。下一节将详细介绍。