Spark部署K8S--standalone

镜像准备

使用上个博客 (Spark部署到K8S集群--Kubernetes Native)中构建的spark镜像,注意,镜像是重中之重。由于公司内网无法访问github, 镜像无法下载,需要自己构建。

部署配置文件编制

Namespace

为了方便管理,新建一个namespace, namespace-spark-cluster.yam

apiVersion: v1
kind: Namespace
metadata:
  name: "spark-cluster"
  labels:
    name: "spark-cluster"

kubectl create -f namespace-spark-cluster.yaml 新建一个名为spark-cluster的namespace。

kubectl create -f namespace-spark-cluster.yaml

Master

master分为两个部分,一个是类型为rc的主体,命名为spark-master-controller.yaml,另一部分为一个service,暴露master的端口给slaver使用(spark-master-service.yaml)。

spark-master-controller.yaml

apiVersion: v1
kind: Namespace
metadata:
  name: "spark-cluster"
  labels:
    name: "spark-cluster"
[root@iZ2ze48olpbvnopfiqqk33Z spark-cluster]# cat spark-master-service.yaml
kind: Service
apiVersion: v1
metadata:
  name: spk-master
  namespace: spark-cluster
spec:
  ports:
    - port: 7077
      targetPort: 7077
      name: spark
    - port: 8080
      targetPort: 8080
      name: http
  selector:
    component: spk-master

以上为controller,直接使用spark的start-master脚本启动,但是启动后他会退到后台,导致k8s启动不了pod,所以还加了个tail -f一个master输出的log,顺便也方便查看log。

spark-master-service.yaml

kind: Service
apiVersion: v1
metadata:
  name: spk-master
  namespace: spark-cluster
spec:
  ports:
    - port: 7077
      targetPort: 7077
      name: spark
    - port: 8080
      targetPort: 8080
      name: http
  selector:
    component: spk-master
[root@iZ2ze48olpbvnopfiqqk33Z spark-cluster]# cat spark-master-controller.yaml
kind: ReplicationController
apiVersion: v1
metadata:
  name: spark-master-controller
  namespace: spark-cluster
spec:
  replicas: 1
  selector:
    component: spk-master
  template:
    metadata:
      labels:
        component: spk-master
    spec:
      containers:
        - name: spk-master
          image: registry-vpc.cn-beijing.aliyuncs.com/acs/spark:spark-v2.4.5
          imagePullPolicy: IfNotPresent
          command: ["/bin/sh"]
          args: ["-c","sh /opt/spark/sbin/start-master.sh && tail -f /opt/spark/logs/spark--org.apache.spark.deploy.master.Master-1-*"]
          ports:
            - containerPort: 7077
            - containerPort: 8080
          resources:
            requests:
              cpu: 100m

一个service,把7077端口和8080端口暴露出来给集群,方便slaver直接用spk-master:8080这样的方式进行访问。注意,只是暴露给集群,外部访问的方式最后会说。

kubectl create -f spark-master-controller.yaml

kubectl create -f spark-master-service.yaml

这里有个坑,start-master这个启动脚本中会用到SPARK_MASTER_PORT这个参数,而上边这个service如果名字为spark-master的话刚好冲突了,会把SPARK_MASTER_PORT设置为 host:port的形式,导致脚本启动失败。所以我一股脑把所有的spark-master改成spk-master了

Work

启动worker脚本中需要传入master的地址,因为有dns且设置了service的缘故,可以通过spk-master.spark-cluster访问。把replicas设置为N即可启动N个worker。-- 另外,我还在worker上加了资源的限制,限制最多使用2个cpu以及12g内存。

spark-worker-controller.yaml

kind: ReplicationController
apiVersion: v1
metadata:
  name: spark-worker-controller
  namespace: spark-cluster
spec:
  replicas: 3
  selector:
    component: spark-worker
  template:
    metadata:
      labels:
        component: spark-worker
    spec:
      containers:
        - name: spark-worker
          image: registry-vpc.cn-beijing.aliyuncs.com/acs/spark:spark-v2.4.5
          command: ["/bin/sh"]
          args: ["-c","sh /opt/spark/sbin/start-slave.sh spark://spk-master.spark-cluster:7077;tail -f /opt/spark/logs/spark--org.apache.spark.deploy.worker.Worker*"]
          ports:
            - containerPort: 8081
          resources:
            requests:
              cpu: "1"
              memory: "2Gi"

Proxy(当前未部署)

image为elsonrodriguez/spark-ui-proxy:1.0 这玩意在一般启动standalone集群的时候是没有的,但是在k8s集群里边,则必不可缺。

设想一下,如果只是简单的暴露master的8080端口出来,我们只能看到master的管理页面,但是进一步从master访问worker的ui则变得不太现实(每个worker都有自己的ui地址,且ip分配很随机,这些ip只能在集群内部访问)。所以我们需要一个代理服务,从内部访问完我们需要的页面后,返回给我们,这样我们只需要暴露一个代理的地址即可。

kind: ReplicationController
apiVersion: v1
metadata:
  name: spark-ui-proxy-controller
spec:
  replicas: 1
  selector:
    component: spark-ui-proxy
  template:
    metadata:
      labels:
        component: spark-ui-proxy
    spec:
      containers:
        - name: spark-ui-proxy
          image: elsonrodriguez/spark-ui-proxy:1.0
          ports:
            - containerPort: 80
          resources:
            requests:
              cpu: 100m
          args:
            - spk-master:8080
          livenessProbe:
              httpGet:
                path: /
                port: 80
              initialDelaySeconds: 120
              timeoutSeconds: 5

kubectl create -f spark-ui-proxy-controller.yaml —namespace=spark-cluster

并且暴露proxy的80端口

kind: Service
apiVersion: v1
metadata:
  name: spark-ui-proxy
spec:
  type: NodePort
  ports:
    - port: 80
      targetPort: 80
      nodePort: 32180
  selector:
    component: spark-ui-proxy

kubectl create -f spark-ui-proxy-service.yaml —namespace=spark-cluster

至此,集群搭建完毕。可以通过集群的32180端口访问管理页面。

TiSpark Deployment

下载依赖包

从git上下载依赖包,https://github.com/pingcap/tispark/releases ,随机选了一个releases的版本。

由于本地网络限制,无法访问github, 从服务器直接下载的

curl -LO https://github.com/pingcap/tispark/releases/tispark-assembly-2.3.16.jar

构建镜像

使用第二步里的镜像,将依赖包打包进镜像里面

编写Dockerfile

FROM registry-vpc.cn-beijing.aliyuncs.com/regis-k/vicky:spark-push-v2.4.5
MAINTAINER mengqiwei@gwm.cn
COPY . /opt/spark/jars/

注意:Dockerfile与依赖包在同个目录下,并且打包的目录只有这两个文件

[root@iZ2ze48olpbvnopfiqqk33Z spark-images]# ls
Dockerfile  tispark-assembly-2.3.16.jar

执行命令构建镜像

docker build -t tispark-2.3.16:spark-v2.4.5 .

使用新镜像重新启动集群

更新文件spark-master-controller.yaml

注意:

  1. 更新namespace与tidb集群在同一个namespace里,免去再次配置网络的步骤。
  2. 更新镜像源 image: tispark-2.3.16:spark-v2.4.5
kind: ReplicationController
apiVersion: v1
metadata:
  name: spark-master-controller
  namespace: tidb-prod
spec:
  replicas: 1
  selector:
    component: spk-master
  template:
    metadata:
      labels:
        component: spk-master
    spec:
      containers:
        - name: spk-master
          image: tispark-2.3.16:spark-v2.4.5
          imagePullPolicy: IfNotPresent
          command: ["/bin/sh"]
          args: ["-c","sh /opt/spark/sbin/start-master.sh && tail -f /opt/spark/logs/spark--org.apache.spark.deploy.master.Master-1-*"]
          ports:
            - containerPort: 7077
            - containerPort: 8080
          resources:
            requests:
              cpu: 100m

启动应用

kubectl apply -f spark-master-controller.yaml

更新spark-master-service.yaml

kind: Service
apiVersion: v1
metadata:
  name: spk-master
  namespace: tidb-prod
spec:
  ports:
    - port: 7077
      targetPort: 7077
      name: spark
    - port: 8080
      targetPort: 8080
      name: http
  selector:
    component: spk-master

启动应用

kubectl apply -f spark-master-service.yaml

更新spark-worker-controller.yaml

kind: ReplicationController
apiVersion: v1
metadata:
  name: spark-worker-controller
  namespace: tidb-prod
spec:
  replicas: 3
  selector:
    component: spark-worker
  template:
    metadata:
      labels:
        component: spark-worker
    spec:
      containers:
        - name: spark-worker
          image: tispark-2.3.16:spark-v2.4.5
          command: ["/bin/sh"]
          args: ["-c","sh /opt/spark/sbin/start-slave.sh spark://spk-master.spark-cluster:7077;tail -f /opt/spark/logs/spark--org.apache.spark.deploy.worker.Worker*"]
          ports:
            - containerPort: 8081
          resources:
            requests:
              cpu: "1"
              memory: "2Gi"

启动应用

kubectl apply -f spark-worker-controller.yaml

查看运行状态

kubectl get all -n tidb-prod

spark on k8s 实践 k8s搭建spark_spark

 

 

 

登录并测试使用TiSpark

登录到容器内

kubectl exec -it pod/spark-master-controller-9vrzw /bin/bash -n tidb-prod

在/opt/spark目录下运行命令, spark.tispark.pd.addresses的参数传入的是tidb的Pd所在的地址,可以通过kubectl get all -o wide查看地址信息

./bin/spark-shell \--conf spark.tispark.pd.addresses=10.31.6.250:2379,10.31.6.2:2379,10.31.6.251:2379 \--conf spark.sql.extensions=org.apache.spark.sql.TiExtensions

执行命令显示结果,此处有个警告,暂未解决。


scala> spark.sql("show tables").show
21/11/03 02:41:08 WARN ObjectStore: Failed to get database ods_cycle, returning NoSuchObjectException
+---------+---------------+-----------+
| database|      tableName|isTemporary|
+---------+---------------+-----------+
|ods_cycle|  ods_cycle_can|      false|
|ods_cycle|ods_cycle_can_p|      false|
|ods_cycle| test_cycle_can|      false|
+---------+---------------+-----------+
scala> spark.sql("select count(*) from test_cycle_can").show
21/11/03 02:43:27 WARN ObjectStore: Failed to get database ods_cycle, returning NoSuchObjectException
21/11/03 02:43:27 WARN ObjectStore: Failed to get database ods_cycle, returning NoSuchObjectException
+--------+
|count(1)|
+--------+
| 1410701|
+--------+