flink sql yarn

原创

mob64ca12dfd1d5 2023-12-03 06:06:42 ©著作权

©著作权归作者所有：来自51CTO博客作者mob64ca12dfd1d5的原创作品，请联系作者获取转载授权，否则将追究法律责任

Flink SQL on YARN

Apache Flink is a powerful open-source stream processing framework that enables the design and execution of real-time streaming applications. Flink SQL is a component of Apache Flink that allows users to write SQL queries to process streaming and batch data. In this article, we will explore how to run Flink SQL queries on YARN, a popular cluster management system.

Introduction to YARN

YARN (Yet Another Resource Negotiator) is a core component of Apache Hadoop that provides cluster management capabilities. It allows users to run various types of applications on a Hadoop cluster by effectively allocating and managing resources. YARN consists of two main components: a ResourceManager and a NodeManager. The ResourceManager is responsible for resource allocation and scheduling, while the NodeManager manages the execution of tasks on individual cluster nodes.

Running Flink SQL on YARN

To run Flink SQL queries on YARN, we need to follow these steps:

Set up a YARN cluster: First, we need to set up a YARN cluster by installing and configuring Hadoop. This involves installing Hadoop on each cluster node and configuring the ResourceManager and NodeManager.
Install Apache Flink: Next, we need to install Apache Flink on the cluster. This can be done by downloading the Flink distribution and extracting it on each cluster node. We also need to configure Flink to use YARN as the cluster execution environment.

Write a Flink SQL query: Once the cluster is set up, we can start writing Flink SQL queries. Flink SQL supports a wide range of SQL operations such as filtering, aggregating, and joining data. Let's consider a simple example where we want to calculate the average temperature for each city from a stream of temperature readings:

CREATE TABLE readings (
  city STRING,
  temperature DOUBLE,
  eventTime TIMESTAMP(3),
  WATERMARK FOR eventTime AS eventTime - INTERVAL '5' SECOND
) WITH (
  'connector' = 'kafka',
  'topic' = 'temperature_readings',
  'properties.bootstrap.servers' = 'localhost:9092',
  'properties.group.id' = 'flink-consumer',
  'format' = 'json'
);

CREATE TABLE average_temperatures (
  city STRING,
  avg_temperature DOUBLE
) WITH (
  'connector' = 'print'
);

INSERT INTO average_temperatures
SELECT city, AVG(temperature) as avg_temperature
FROM readings
GROUP BY city;

Submit the Flink job: Once the query is written, we can submit it to the Flink cluster running on YARN. This can be done using the flink run command, specifying the Flink SQL file as the job to run:
```
./bin/flink run -m yarn-cluster -yn 2 -yjm 1024 -ytm 1024 -ys 2m -c org.apache.flink.table.api.bridge.java.StreamTableEnvironment sql_job.sql
```
In this example, we are running the job on a YARN cluster with 2 task managers (-yn 2), 1GB of job manager memory (-yjm 1024), 1GB of task manager memory (-ytm 1024), and 2MB of YARN container overhead (-ys 2m).
Monitor the job: Once the job is submitted, we can monitor its progress using the Flink Web UI or the YARN ResourceManager UI. These interfaces provide information about job status, resource utilization, and task metrics.

Conclusion

Running Flink SQL on YARN allows us to leverage the benefits of both frameworks. YARN provides robust cluster management capabilities, while Flink SQL enables us to write complex data processing queries using familiar SQL syntax. By combining these two technologies, we can build scalable and efficient stream processing applications.

journey
    title Running Flink SQL on YARN
    section Set up a YARN cluster
    section Install Apache Flink
    section Write a Flink SQL query
    section Submit the Flink job
    section Monitor the job

classDiagram
    class ResourceManager
    class NodeManager
    class FlinkSQL
    class YARNCluster

    ResourceManager <|-- YARNCluster
    NodeManager <|-- YARNCluster
    FlinkSQL <-- YARNCluster

In this article, we explored how to run Flink SQL queries on a YARN cluster. We learned about the components of YARN and the steps involved in running Flink SQL on YARN. We also saw a simple example of a Flink SQL query and how to submit it to the cluster. By leveraging the power of both Flink and YARN, we can build scalable and efficient stream processing applications.