文章目录
- 一、kylin介绍
- 1、介绍
- 2、功能特点
- 3、生态
- 二、kylin单机部署
- 1、软件要求
- 2、Hadoop 环境
- 3、安裝
- 1)、解壓
- 2)、Kylin tarball 目录
- 3)、增加kylin依赖组件的配置
- 4)、配置kylin環境變量
- 5)、配置kylin.sh
- 6)、配置conf/kylin.properties
- 4、检查运行环境
- 5、启动依賴集群
- 1)、启动zookeeper
- 2)、启动Hadoop集群
- 3)、启动HBase集群
- 4)、启动 hive
- 5)、启动 kylin
- 6、驗證 kylin
本文介绍了kylin的基本情况和单机部署的说明及验证。
本文依赖环境有zookeeper、hadoop、hbase、hive环境,其相关的环境详见本人相关专题。
本文分为2个部分,及kylin的介绍和部署。
一、kylin介绍
1、介绍
Apache Kylin™是一个开源的、分布式的分析型数据仓库,提供 Hadoop 之上的 SQL 查询接口及多维分析(OLAP)能力以支持超大规模数据,最初由eBay Inc.开发并贡献至开源社区。
官网地址:https://kylin.apache.org/cn/ Apache Kylin™ 令使用者仅需三步,即可实现超大数据集上的亚秒级查询。
- 定义数据集上的一个星形或雪花形模型
- 在定义的数据表上构建cube
- 使用标准 SQL 通过 ODBC、JDBC 或 RESTFUL API 进行查询,仅需亚秒级响应时间即可获得查询结果
Kylin 提供与多种数据可视化工具的整合能力,如 Tableau,PowerBI 等,令用户可以使用 BI 工具对 Hadoop 数据进行分析。
2、功能特点
Kylin 是一个 Hadoop 生态圈下的 MOLAP 系统,是 ebay 大数据部门从2014 年开始研发的支持 TB 到 PB 级别数据量的分布式 Olap 分析引擎。其特点包括:
- 可扩展超快的基于大数据的分析型数据仓库:Kylin 是为减少在 Hadoop/Spark 上百亿规模数据查询延迟而设计
- Hadoop ANSI SQL 接口:作为一个分析型数据仓库(也是 OLAP 引擎),Kylin 为 Hadoop 提供标准 SQL 支持大部分查询功能
- 交互式查询能力:通过 Kylin,用户可以与 Hadoop 数据进行亚秒级交互,在同样的数据集上提供比 Hive 更好的性能
- 多维立方体(MOLAP Cube):用户能够在 Kylin 里为百亿以上数据集定义数据模型并构建立方体
- 实时 OLAP:Kylin 可以在数据产生时进行实时处理,用户可以在秒级延迟下进行实时数据的多维分析。
- 与BI工具无缝整合:Kylin 提供与 BI 工具的整合能力,如Tableau,PowerBI/Excel,MSTR,QlikSense,Hue 和 SuperSet
- 其他特性:
Job管理与监控
压缩与编码
增量更新
利用HBase Coprocessor
基于HyperLogLog的Dinstinc Count近似算法
友好的web界面以管理,监控和使用立方体
项目及表级别的访问控制安全
支持LDAP、SSO
3、生态
- Kylin 核心,Kylin 基础框架,包括元数据引擎,查询引擎,Job引擎及存储引擎等,同时包括REST服务器以响应客户端请求
- 扩展,支持额外功能和特性的插件
- 整合,与调度系统,ETL,监控等生命周期管理系统的整合
- 用户界面,在Kylin核心之上扩展的第三方用户界面
- 驱动,ODBC 和 JDBC 驱动以支持不同的工具和产品,比如Tableau
1、软件要求
- Hadoop: 2.7+, 3.1+ (since v2.5) 示例環境:3.1.4
- Hive: 0.13 - 1.2.1+ 示例環境:3.1.2
- HBase: 1.1+, 2.0 (since v2.5) 示例環境:2.1.0
- Spark (可选) 2.3.0+
- Kafka (可选) 1.0.0+ (since v2.5)
- JDK: 1.8+ (since v2.5) 示例環境:1.8
- OS: Linux only, CentOS 6.5+ or Ubuntu 16.0.4+ 示例環境:centos 6.10
2、Hadoop 环境
Kylin 依赖于 Hadoop 集群处理大量的数据集。需要准备一个配置好 HDFS,YARN,MapReduce,Hive, HBase,Zookeeper 和其他服务的 Hadoop 集群供 Kylin 运行。
Kylin 可以在 Hadoop 集群的任意节点上启动。方便起见,可以在 master 节点上运行 Kylin。但为了更好的稳定性,建议将 Kylin 部署在一个干净的 Hadoop client 节点上,该节点上 Hive,HBase,HDFS 等命令行已安装好且 client 配置(如 core-site.xml,hive-site.xml,hbase-site.xml及其他)也已经合理的配置且其可以自动和其它节点同步。
运行 Kylin 的 Linux 账户要有访问 Hadoop 集群的权限,包括创建/写入 HDFS 文件夹,Hive 表, HBase 表和提交 MapReduce 任务的权限。
3、安裝
由於hive安裝在server4上,故本示例kylin也是安裝在server4(192.168.10.44)上。
安装文件去官网上下载即可,本处不再赘述。本文的版本是apache-kylin-3.1.3-bin-hadoop3.tar.gz
1)、解壓
[alanchan@server4 ~]$ tar -zxf /usr/local/bigdata/apache-kylin-3.1.3-bin-hadoop3.tar.gz -C /usr/local/bigdata
[alanchan@server4 apache-kylin-3.1.3-bin-hadoop3]$ pwd
/usr/local/bigdata/apache-kylin-3.1.3-bin-hadoop3
[alanchan@server4 apache-kylin-3.1.3-bin-hadoop3]$ ll
总用量 56
drwxr-xr-x 2 alanchan root 4096 12月 29 2021 bin
-rw-r--r-- 1 alanchan root 823 12月 29 2021 commit_SHA1
drwxr-xr-x 2 alanchan root 4096 12月 29 2021 conf
drwxr-xr-x 3 alanchan root 4096 12月 29 2021 lib
-rw-r--r-- 1 alanchan root 14725 12月 29 2021 LICENSE
-rw-r--r-- 1 alanchan root 1279 12月 29 2021 NOTICE
-rw-r--r-- 1 alanchan root 3222 12月 29 2021 README.md
drwxr-xr-x 4 alanchan root 4096 12月 29 2021 sample_cube
drwxr-xr-x 9 alanchan root 4096 12月 29 2021 tomcat
drwxr-xr-x 2 alanchan root 4096 12月 29 2021 tool
-rw-r--r-- 1 alanchan root 19 12月 29 2021 VERSION
2)、Kylin tarball 目录
- bin, shell 脚本,用于启动/停止 Kylin,备份/恢复 Kylin 元数据,以及一些检查端口、获取 Hive/HBase 依赖的方法等
- conf,Hadoop 任务的 XML 配置文件,这些文件的作用可参考配置页面
- lib,供外面应用使用的 jar 文件,例如 Hadoop 任务 jar, JDBC 驱动, HBase coprocessor 等
- meta_backups,执行
- bin/metastore.sh backup 后的默认的备份目录;
- sample_cube 用于创建样例 Cube 和表的文件
- spark,自带的 spark
- tomcat,自带的 tomcat,用于启动 Kylin 服务
- tool,用于执行一些命令行的jar文件
3)、增加kylin依赖组件的配置
[alanchan@server4 conf]$ cd /usr/local/bigdata/apache-kylin-3.1.3-bin-hadoop3/conf
[alanchan@server4 apache-kylin-3.1.3-bin-hadoop3]$ cd conf
[alanchan@server4 conf]$ ll
总用量 64
-rw-r--r-- 1 alanchan root 3605 12月 29 2021 kylin_hive_conf.xml
-rw-r--r-- 1 alanchan root 3690 12月 29 2021 kylin_job_conf_cube_merge.xml
-rw-r--r-- 1 alanchan root 3807 12月 29 2021 kylin_job_conf_inmem.xml
-rw-r--r-- 1 alanchan root 3159 12月 29 2021 kylin_job_conf.xml
-rw-r--r-- 1 alanchan root 1156 12月 29 2021 kylin-kafka-consumer.xml
-rw-r--r-- 1 alanchan root 17062 12月 29 2021 kylin.properties
-rw-r--r-- 1 alanchan root 2257 12月 29 2021 kylin-server-log4j.properties
-rw-r--r-- 1 alanchan root 2011 12月 29 2021 kylin-spark-log4j.properties
-rw-r--r-- 1 alanchan root 1655 12月 29 2021 kylin-tools-log4j.properties
-rwxr-xr-x 1 alanchan root 4120 12月 29 2021 setenv.sh
-rwxr-xr-x 1 alanchan root 3870 12月 29 2021 setenv-tool.sh
需要先確認hadoop、hbase、hive是否配置了環境變量
ln -s $HADOOP_HOME/etc/hadoop/hdfs-site.xml hdfs-site.xml
ln -s $HADOOP_HOME/etc/hadoop/core-site.xml core-site.xml
ln -s $HBASE_HOME/conf/hbase-site.xml hbase-site.xml
ln -s $HIVE_HOME/conf/hive-site.xml hive-site.xml
創建正確如下
[alanchan@server4 conf]$ ll
总用量 64
lrwxrwxrwx 1 alanchan root 56 1月 4 17:27 core-site.xml -> /usr/local/bigdata/hadoop-3.1.4/etc/hadoop/core-site.xml
lrwxrwxrwx 1 alanchan root 50 1月 4 17:27 hbase-site.xml -> /usr/local/bigdata/hbase-2.1.0/conf/hbase-site.xml
lrwxrwxrwx 1 alanchan root 56 1月 4 17:26 hdfs-site.xml -> /usr/local/bigdata/hadoop-3.1.4/etc/hadoop/hdfs-site.xml
lrwxrwxrwx 1 alanchan root 59 1月 4 17:33 hive-site.xml -> /usr/local/bigdata/apache-hive-3.1.2-bin/conf/hive-site.xml
-rw-r--r-- 1 alanchan root 3605 12月 29 2021 kylin_hive_conf.xml
-rw-r--r-- 1 alanchan root 3690 12月 29 2021 kylin_job_conf_cube_merge.xml
-rw-r--r-- 1 alanchan root 3807 12月 29 2021 kylin_job_conf_inmem.xml
-rw-r--r-- 1 alanchan root 3159 12月 29 2021 kylin_job_conf.xml
-rw-r--r-- 1 alanchan root 1156 12月 29 2021 kylin-kafka-consumer.xml
-rw-r--r-- 1 alanchan root 17062 12月 29 2021 kylin.properties
-rw-r--r-- 1 alanchan root 2257 12月 29 2021 kylin-server-log4j.properties
-rw-r--r-- 1 alanchan root 2011 12月 29 2021 kylin-spark-log4j.properties
-rw-r--r-- 1 alanchan root 1655 12月 29 2021 kylin-tools-log4j.properties
-rwxr-xr-x 1 alanchan root 4120 12月 29 2021 setenv.sh
-rwxr-xr-x 1 alanchan root 3870 12月 29 2021 setenv-tool.sh
4)、配置kylin環境變量
vim /etc/profile
# 增加
#kylin
export KYLIN_HOME=/usr/local/bigdata/apache-kylin-3.1.3-bin-hadoop3
export PATH=$PATH:${KYLIN_HOME}/bin:
source /etc/profile
5)、配置kylin.sh
cd /usr/local/bigdata/apache-kylin-3.1.3-bin-hadoop3/bin
vim kylin.sh
kylin.sh文件添加如下内容:
export HADOOP_HOME=/usr/local/bigdata/hadoop-3.1.4
export HIVE_HOME=/usr/local/bigdata/apache-hive-3.1.2-bin
export HBASE_HOME=/usr/local/bigdata/hbase-2.1.0
完整配置文件如下
#!/bin/bash
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
# set verbose=true to print more logs during start up
export HADOOP_HOME=/usr/local/bigdata/hadoop-3.1.4
export HIVE_HOME=/usr/local/bigdata/apache-hive-3.1.2-bin
export HBASE_HOME=/usr/local/bigdata/hbase-2.1.0
source ${KYLIN_HOME:-"$(cd -P -- "$(dirname -- "$0")" && pwd -P)/../"}/bin/header.sh $@
if [ "$verbose" = true ]; then
shift
fi
mkdir -p ${KYLIN_HOME}/logs
mkdir -p ${KYLIN_HOME}/ext
source ${dir}/set-java-home.sh
function retrieveDependency() {
#retrive $hive_dependency and $hbase_dependency
if [[ -z $reload_dependency && `ls -1 ${dir}/cached-* 2>/dev/null | wc -l` -eq 6 ]]
then
echo "Using cached dependency..."
source ${dir}/cached-hive-dependency.sh
if [ -z "${hive_warehouse_dir}" ] || [ -z "${hive_dependency}" ] || [ -z "${hive_conf_path}" ]; then
echo "WARNING: Using ${dir}/cached-hive-dependency.sh failed,will be use ${dir}/find-hive-dependency.sh"
source ${dir}/find-hive-dependency.sh
fi
source ${dir}/cached-hbase-dependency.sh
if [ -z "${hbase_dependency}" ]; then
echo "WARNING: Using ${dir}/cached-hbase-dependency.sh failed,will be use ${dir}/find-hbase-dependency.sh"
source ${dir}/find-hbase-dependency.sh
fi
source ${dir}/cached-hadoop-conf-dir.sh
if [ -z "${kylin_hadoop_conf_dir}" ]; then
echo "WARNING: Using ${dir}/cached-hadoop-conf-dir.sh failed,will be use ${dir}/find-hadoop-conf-dir.sh"
source ${dir}/find-hadoop-conf-dir.sh
fi
source ${dir}/cached-kafka-dependency.sh
if [ -z "${kafka_dependency}" ]; then
echo "WARNING: Using ${dir}/cached-kafka-dependency.sh failed,will be use ${dir}/find-kafka-dependency.sh"
source ${dir}/find-kafka-dependency.sh
fi
source ${dir}/cached-spark-dependency.sh
if [ -z "${spark_dependency}" ]; then
echo "WARNING: Using ${dir}/cached-spark-dependency.sh failed,will be use ${dir}/find-spark-dependency.sh"
source ${dir}/find-spark-dependency.sh
fi
source ${dir}/cached-flink-dependency.sh
if [ -z "${flink_dependency}" ]; then
echo "WARNING: Using ${dir}/cached-flink-dependency.sh failed,will be use ${dir}/find-flink-dependency.sh"
source ${dir}/find-flink-dependency.sh
fi
else
source ${dir}/find-hive-dependency.sh
source ${dir}/find-hbase-dependency.sh
source ${dir}/find-hadoop-conf-dir.sh
source ${dir}/find-kafka-dependency.sh
source ${dir}/find-spark-dependency.sh
source ${dir}/find-flink-dependency.sh
fi
#retrive $KYLIN_EXTRA_START_OPTS
if [ -f "${dir}/setenv.sh" ]; then
echo "WARNING: ${dir}/setenv.sh is deprecated and ignored, please remove it and use ${KYLIN_HOME}/conf/setenv.sh instead"
source ${dir}/setenv.sh
fi
if [ -f "${KYLIN_HOME}/conf/setenv.sh" ]; then
source ${KYLIN_HOME}/conf/setenv.sh
fi
export HBASE_CLASSPATH_PREFIX=${KYLIN_HOME}/conf:${KYLIN_HOME}/lib/*:${KYLIN_HOME}/ext/*:${HBASE_CLASSPATH_PREFIX}
export HBASE_CLASSPATH=${HBASE_CLASSPATH}:${hive_dependency}:${kafka_dependency}:${spark_dependency}:${flink_dependency}
verbose "HBASE_CLASSPATH: ${HBASE_CLASSPATH}"
}
function retrieveStartCommand() {
if [ -f "${KYLIN_HOME}/pid" ]
then
PID=`cat $KYLIN_HOME/pid`
if ps -p $PID > /dev/null
then
quit "Kylin is running, stop it first"
fi
fi
lockfile=$KYLIN_HOME/LOCK
if [ ! -e $lockfile ]; then
trap "rm -f $lockfile; exit" INT TERM EXIT
touch $lockfile
else
quit "Kylin is starting, wait for it"
fi
source ${dir}/check-env.sh
tomcat_root=${dir}/../tomcat
export tomcat_root
#The location of all hadoop/hbase configurations are difficult to get.
#Plus, some of the system properties are secretly set in hadoop/hbase shell command.
#For example, in hdp 2.2, there is a system property called hdp.version,
#which we cannot get until running hbase or hadoop shell command.
#
#To save all these troubles, we use hbase runjar to start tomcat.
#In this way we no longer need to explicitly configure hadoop/hbase related classpath for tomcat,
#hbase command will do all the dirty tasks for us:
spring_profile=`bash ${dir}/get-properties.sh kylin.security.profile`
if [ -z "$spring_profile" ]
then
quit 'Please set kylin.security.profile in kylin.properties, options are: testing, ldap, saml.'
else
verbose "kylin.security.profile is set to $spring_profile"
fi
# the number of Spring active profiles can be greater than 1. Additional profiles
# can be added by setting kylin.security.additional-profiles
additional_security_profiles=`bash ${dir}/get-properties.sh kylin.security.additional-profiles`
if [[ "x${additional_security_profiles}" != "x" ]]; then
spring_profile="${spring_profile},${additional_security_profiles}"
fi
retrieveDependency
#additionally add tomcat libs to HBASE_CLASSPATH_PREFIX
export HBASE_CLASSPATH_PREFIX=${tomcat_root}/bin/bootstrap.jar:${tomcat_root}/bin/tomcat-juli.jar:${tomcat_root}/lib/*:${HBASE_CLASSPATH_PREFIX}
kylin_rest_address=`hostname -f`":"`grep "<Connector port=" ${tomcat_root}/conf/server.xml |grep protocol=\"HTTP/1.1\" | cut -d '=' -f 2 | cut -d \" -f 2`
kylin_rest_address_arr=(${kylin_rest_address//;/ })
nc -z -w 5 ${kylin_rest_address_arr[0]} ${kylin_rest_address_arr[1]} 1>/dev/null 2>&1; nc_result=$?
if [ $nc_result -eq 0 ]; then
quit "Port ${kylin_rest_address} is not available, could not start Kylin."
fi
${KYLIN_HOME}/bin/check-migration-acl.sh || { exit 1; }
#debug if encounter NoClassDefError
verbose "kylin classpath is: $(hbase classpath)"
security_ldap_truststore=`bash ${dir}/get-properties.sh kylin.security.ldap.connection-truststore`
if [ -f "${security_ldap_truststore}" ]; then
KYLIN_EXTRA_START_OPTS="$KYLIN_EXTRA_START_OPTS -Djavax.net.ssl.trustStore=$security_ldap_truststore"
fi
# KYLIN_EXTRA_START_OPTS is for customized settings, checkout bin/setenv.sh
start_command="hbase ${KYLIN_EXTRA_START_OPTS} \
-Djava.util.logging.config.file=${tomcat_root}/conf/logging.properties \
-Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager \
-Dlog4j.configuration=file:${KYLIN_HOME}/conf/kylin-server-log4j.properties \
-Dorg.apache.tomcat.util.buf.UDecoder.ALLOW_ENCODED_SLASH=true \
-Dorg.apache.catalina.connector.CoyoteAdapter.ALLOW_BACKSLASH=true \
-Djava.endorsed.dirs=${tomcat_root}/endorsed \
-Dcatalina.base=${tomcat_root} \
-Dcatalina.home=${tomcat_root} \
-Djava.io.tmpdir=${tomcat_root}/temp \
-Dkylin.hive.dependency=${hive_dependency} \
-Dkylin.hbase.dependency=${hbase_dependency} \
-Dkylin.kafka.dependency=${kafka_dependency} \
-Dkylin.spark.dependency=${spark_dependency} \
-Dkylin.flink.dependency=${flink_dependency} \
-Dkylin.hadoop.conf.dir=${kylin_hadoop_conf_dir} \
-Dkylin.server.host-address=${kylin_rest_address} \
-Dkylin.source.hive.warehouse-dir=${hive_warehouse_dir} \
-Dspring.profiles.active=${spring_profile} \
org.apache.hadoop.util.RunJar ${tomcat_root}/bin/bootstrap.jar org.apache.catalina.startup.Bootstrap start"
}
function retrieveStopCommand() {
if [ -f "${KYLIN_HOME}/pid" ]
then
PID=`cat $KYLIN_HOME/pid`
WAIT_TIME=2
LOOP_COUNTER=10
if ps -p $PID > /dev/null
then
echo "Stopping Kylin: $PID"
kill $PID
for ((i=0; i<$LOOP_COUNTER; i++))
do
# wait to process stopped
sleep $WAIT_TIME
if ps -p $PID > /dev/null ; then
echo "Stopping in progress. Will check after $WAIT_TIME secs again..."
continue;
else
break;
fi
done
# if process is still around, use kill -9
if ps -p $PID > /dev/null
then
echo "Initial kill failed, getting serious now..."
kill -9 $PID
sleep 1 #give kill -9 sometime to "kill"
if ps -p $PID > /dev/null
then
quit "Warning, even kill -9 failed, giving up! Sorry..."
fi
fi
# process is killed , remove pid file
rm -rf ${KYLIN_HOME}/pid
echo "Kylin with pid ${PID} has been stopped."
return 0
else
echo "Kylin with pid ${PID} is not running"
return 1
fi
else
return 1
fi
}
if [ "$2" == "--reload-dependency" ]
then
reload_dependency=1
fi
# start command
if [ "$1" == "start" ]
then
retrieveStartCommand
${start_command} >> ${KYLIN_HOME}/logs/kylin.out 2>&1 & echo $! > ${KYLIN_HOME}/pid &
rm -f $lockfile
echo ""
echo "A new Kylin instance is started by $USER. To stop it, run 'kylin.sh stop'"
echo "Check the log at ${KYLIN_HOME}/logs/kylin.log"
echo "Web UI is at http://${kylin_rest_address_arr}/kylin"
exit 0
# run command
elif [ "$1" == "run" ]
then
retrieveStartCommand
${start_command}
rm -f $lockfile
# stop command
elif [ "$1" == "stop" ]
then
retrieveStopCommand
if [[ $? == 0 ]]
then
exit 0
else
quit "Kylin is not running"
fi
# restart command
elif [ "$1" == "restart" ]
then
echo "Restarting kylin..."
echo "--> Stopping kylin first if it's running..."
retrieveStopCommand
if [[ $? != 0 ]]
then
echo "Kylin is not running, now start it"
fi
echo "--> Start kylin..."
retrieveStartCommand
${start_command} >> ${KYLIN_HOME}/logs/kylin.out 2>&1 & echo $! > ${KYLIN_HOME}/pid &
rm -f $lockfile
echo ""
echo "A new Kylin instance is started by $USER. To stop it, run 'kylin.sh stop'"
echo "Check the log at ${KYLIN_HOME}/logs/kylin.log"
echo "Web UI is at http://${kylin_rest_address_arr}/kylin"
exit 0
# streaming command
elif [ "$1" == "streaming" ]
then
if [ $# -lt 2 ]
then
echo "Invalid input args $@"
exit -1
fi
if [ "$2" == "start" ]
then
if [ -f "${KYLIN_HOME}/streaming_receiver_pid" ]
then
PID=`cat $KYLIN_HOME/streaming_receiver_pid`
if ps -p $PID > /dev/null
then
echo "Kylin streaming receiver is running, stop it first"
exit 1
fi
fi
#retrive $hbase_dependency
source ${dir}/find-hbase-dependency.sh
#retrive $KYLIN_EXTRA_START_OPTS
if [ -f "${KYLIN_HOME}/conf/setenv.sh" ]
then source ${KYLIN_HOME}/conf/setenv.sh
fi
mkdir -p ${KYLIN_HOME}/ext
HBASE_CLASSPATH=`hbase classpath`
#echo "hbase class path:"$HBASE_CLASSPATH
STREAM_CLASSPATH=${KYLIN_HOME}/lib/streaming/*:${KYLIN_HOME}/ext/*:${HBASE_CLASSPATH}
# KYLIN_EXTRA_START_OPTS is for customized settings, checkout bin/setenv.sh
${JAVA_HOME}/bin/java -cp $STREAM_CLASSPATH ${KYLIN_EXTRA_START_OPTS} \
-Dlog4j.configuration=stream-receiver-log4j.properties\
-DKYLIN_HOME=${KYLIN_HOME}\
-Dkylin.hbase.dependency=${hbase_dependency} \
org.apache.kylin.stream.server.StreamingReceiver $@ > ${KYLIN_HOME}/logs/streaming_receiver.out 2>&1 & echo $! > ${KYLIN_HOME}/streaming_receiver_pid &
exit 0
elif [ "$2" == "stop" ]
then
if [ ! -f "${KYLIN_HOME}/streaming_receiver_pid" ]
then
echo "Streaming receiver is not running, please check"
exit 1
fi
PID=`cat ${KYLIN_HOME}/streaming_receiver_pid`
if [ "$PID" = "" ]
then
echo "Streaming receiver is not running, please check"
exit 1
else
echo "Stopping streaming receiver: $PID"
WAIT_TIME=2
LOOP_COUNTER=20
if ps -p $PID > /dev/null
then
kill $PID
for ((i=0; i<$LOOP_COUNTER; i++))
do
# wait to process stopped
sleep $WAIT_TIME
if ps -p $PID > /dev/null ; then
echo "Stopping in progress. Will check after $WAIT_TIME secs again..."
continue;
else
break;
fi
done
# if process is still around, use kill -9
if ps -p $PID > /dev/null
then
echo "Initial kill failed, getting serious now..."
kill -9 $PID
sleep 1 #give kill -9 sometime to "kill"
if ps -p $PID > /dev/null
then
quit "Warning, even kill -9 failed, giving up! Sorry..."
fi
fi
# process is killed , remove pid file
rm -rf ${KYLIN_HOME}/streaming_receiver_pid
echo "Kylin streaming receiver with pid ${PID} has been stopped."
exit 0
else
quit "Kylin streaming receiver with pid ${PID} is not running"
fi
fi
elif [[ "$2" = org.apache.kylin.* ]]
then
source ${KYLIN_HOME}/conf/setenv.sh
HBASE_CLASSPATH=`hbase classpath`
#echo "hbase class path:"$HBASE_CLASSPATH
STREAM_CLASSPATH=${KYLIN_HOME}/lib/streaming/*:${KYLIN_HOME}/ext/*:${HBASE_CLASSPATH}
shift
# KYLIN_EXTRA_START_OPTS is for customized settings, checkout bin/setenv.sh
${JAVA_HOME}/bin/java -cp $STREAM_CLASSPATH ${KYLIN_EXTRA_START_OPTS} \
-Dlog4j.configuration=stream-receiver-log4j.properties\
-DKYLIN_HOME=${KYLIN_HOME}\
-Dkylin.hbase.dependency=${hbase_dependency} \
"$@"
exit 0
fi
elif [ "$1" = "version" ]
then
retrieveDependency
exec hbase -Dlog4j.configuration=file:${KYLIN_HOME}/conf/kylin-tools-log4j.properties org.apache.kylin.common.KylinVersion
exit 0
elif [ "$1" = "diag" ]
then
echo "'kylin.sh diag' no longer supported, use diag.sh instead"
exit 0
# tool command
elif [[ "$1" = org.apache.kylin.* ]]
then
retrieveDependency
#retrive $KYLIN_EXTRA_START_OPTS from a separate file called setenv-tool.sh
unset KYLIN_EXTRA_START_OPTS # unset the global server setenv config first
if [ -f "${dir}/setenv-tool.sh" ]; then
echo "WARNING: ${dir}/setenv-tool.sh is deprecated and ignored, please remove it and use ${KYLIN_HOME}/conf/setenv-tool.sh instead"
source ${dir}/setenv-tool.sh
fi
if [ -f "${KYLIN_HOME}/conf/setenv-tool.sh" ]; then
source ${KYLIN_HOME}/conf/setenv-tool.sh
fi
hbase_pre_original=${HBASE_CLASSPATH_PREFIX}
export HBASE_CLASSPATH_PREFIX=${KYLIN_HOME}/tool/*:${HBASE_CLASSPATH_PREFIX}
exec hbase ${KYLIN_EXTRA_START_OPTS} -Dkylin.hive.dependency=${hive_dependency} -Dkylin.hbase.dependency=${hbase_dependency} -Dlog4j.configuration=file:${KYLIN_HOME}/conf/kylin-tools-log4j.properties "$@"
export HBASE_CLASSPATH_PREFIX=${hbase_pre_original}
else
quit "Usage: 'kylin.sh [-v] start' or 'kylin.sh [-v] stop' or 'kylin.sh [-v] restart'"
fi
6)、配置conf/kylin.properties
cd /usr/local/bigdata/apache-kylin-3.1.3-bin-hadoop3/conf
## Working folder in HDFS, better be qualified absolute path, make sure user has the right permission to this directory
kylin.env.hdfs-working-dir=/kylin
## kylin zk base path
kylin.env.zookeeper-base-path=/kylin
kylin.source.hive.keep-flat-table=false
## Hive database name for putting the intermediate flat tables
kylin.source.hive.database-for-flat-table=default
## Whether redistribute the intermediate flat table before building
kylin.source.hive.redistribute-flat-table=true
## The storage for final cube file in hbase
kylin.storage.url=hbase
## The prefix of hbase table
kylin.storage.hbase.table-name-prefix=KYLIN_
## The namespace for hbase storage
kylin.storage.hbase.namespace=default
## Compression codec for htable, valid value [none, snappy, lzo, gzip, lz4]
kylin.storage.hbase.compression-codec=none
## Hadoop conf folder, will export this as "HADOOP_CONF_DIR" to run spark-submit
## This must contain site xmls of core, yarn, hive, and hbase in one folder
kylin.env.hadoop-conf-dir=/usr/local/bigdata/hadoop-3.1.4/etc/hadoop
完整配文件如下
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
# The below commented values will effect as default settings
# Uncomment and override them if necessary
#
#### METADATA | ENV ###
#
## The metadata store in hbase
#kylin.metadata.url=kylin_metadata@hbase
#
## metadata cache sync retry times
#kylin.metadata.sync-retries=3
#
## Working folder in HDFS, better be qualified absolute path, make sure user has the right permission to this directory
kylin.env.hdfs-working-dir=/kylin
#
## DEV|QA|PROD. DEV will turn on some dev features, QA and PROD has no difference in terms of functions.
#kylin.env=QA
#
## kylin zk base path
kylin.env.zookeeper-base-path=/kylin
#
#### SERVER | WEB | RESTCLIENT ###
#
## Kylin server mode, valid value [all, query, job]
#kylin.server.mode=all
#
## List of web servers in use, this enables one web server instance to sync up with other servers.
#kylin.server.cluster-servers=localhost:7070
#
## Display timezone on UI,format like[GMT+N or GMT-N]
#kylin.web.timezone=
#
## Timeout value for the queries submitted through the Web UI, in milliseconds
#kylin.web.query-timeout=300000
#
#kylin.web.cross-domain-enabled=true
#
##allow user to export query result
#kylin.web.export-allow-admin=true
#kylin.web.export-allow-other=true
#
## Hide measures in measure list of cube designer, separate by comma
#kylin.web.hide-measures=RAW
#
##max connections of one route
#kylin.restclient.connection.default-max-per-route=20
#
##max connections of one rest-client
#kylin.restclient.connection.max-total=200
#
#### PUBLIC CONFIG ###
#kylin.engine.default=2
#kylin.storage.default=2
#kylin.web.hive-limit=20
#kylin.web.help.length=4
#kylin.web.help.0=start|Getting Started|http://kylin.apache.org/docs/tutorial/kylin_sample.html
#kylin.web.help.1=odbc|ODBC Driver|http://kylin.apache.org/docs/tutorial/odbc.html
#kylin.web.help.2=tableau|Tableau Guide|http://kylin.apache.org/docs/tutorial/tableau_91.html
#kylin.web.help.3=onboard|Cube Design Tutorial|http://kylin.apache.org/docs/howto/howto_optimize_cubes.html
#kylin.web.link-streaming-guide=http://kylin.apache.org/
#kylin.htrace.show-gui-trace-toggle=false
#kylin.web.link-hadoop=
#kylin.web.link-diagnostic=
#kylin.web.contact-mail=
#kylin.server.external-acl-provider=
#
#### CUBE MIGRATION
#kylin.cube.migration.enabled=false
#
## Default time filter for job list, 0->current day, 1->last one day, 2->last one week, 3->last one year, 4->all
#kylin.web.default-time-filter=1
#
#### SOURCE ###
#
## Hive client, valid value [cli, beeline]
#kylin.source.hive.client=cli
#
## Absolute path to beeline shell, can be set to spark beeline instead of the default hive beeline on PATH
#kylin.source.hive.beeline-shell=beeline
#
## Parameters for beeline client, only necessary if hive client is beeline
##kylin.source.hive.beeline-params=-n root --hiveconf hive.security.authorization.sqlstd.confwhitelist.append='mapreduce.job.*|dfs.*' -u jdbc:hive2://localhost:10000
#
## While hive client uses above settings to read hive table metadata,
## table operations can go through a separate SparkSQL command line, given SparkSQL connects to the same Hive metastore.
#kylin.source.hive.enable-sparksql-for-table-ops=false
##kylin.source.hive.sparksql-beeline-shell=/path/to/spark-client/bin/beeline
##kylin.source.hive.sparksql-beeline-params=-n root --hiveconf hive.security.authorization.sqlstd.confwhitelist.append='mapreduce.job.*|dfs.*' -u jdbc:hive2://localhost:10000
#
kylin.source.hive.keep-flat-table=false
#
## Hive database name for putting the intermediate flat tables
kylin.source.hive.database-for-flat-table=default
#
## Whether redistribute the intermediate flat table before building
kylin.source.hive.redistribute-flat-table=true
## Define how to access to hive metadata
## When user deploy kylin on AWS EMR and Glue is used as external metadata, use gluecatalog instead
#kylin.source.hive.metadata-type=hcatalog
#
#### STORAGE ###
#
## The storage for final cube file in hbase
kylin.storage.url=hbase
#
## The prefix of hbase table
kylin.storage.hbase.table-name-prefix=KYLIN_
#
## The namespace for hbase storage
kylin.storage.hbase.namespace=default
#
## Compression codec for htable, valid value [none, snappy, lzo, gzip, lz4]
kylin.storage.hbase.compression-codec=none
#
## HBase Cluster FileSystem, which serving hbase, format as hdfs://hbase-cluster:8020
## Leave empty if hbase running on same cluster with hive and mapreduce
##kylin.storage.hbase.cluster-fs=
#
## The cut size for hbase region, in GB.
#kylin.storage.hbase.region-cut-gb=5
#
## The hfile size of GB, smaller hfile leading to the converting hfile MR has more reducers and be faster.
## Set 0 to disable this optimization.
#kylin.storage.hbase.hfile-size-gb=2
#
#kylin.storage.hbase.min-region-count=1
#kylin.storage.hbase.max-region-count=500
#
## Optional information for the owner of kylin platform, it can be your team's email
## Currently it will be attached to each kylin's htable attribute
#kylin.storage.hbase.owner-tag=whoami@kylin.apache.org
#
#kylin.storage.hbase.coprocessor-mem-gb=3
#
## By default kylin can spill query's intermediate results to disks when it's consuming too much memory.
## Set it to false if you want query to abort immediately in such condition.
#kylin.storage.partition.aggr-spill-enabled=true
#
## The maximum number of bytes each coprocessor is allowed to scan.
## To allow arbitrary large scan, you can set it to 0.
#kylin.storage.partition.max-scan-bytes=3221225472
#
## The default coprocessor timeout is (hbase.rpc.timeout * 0.9) / 1000 seconds,
## You can set it to a smaller value. 0 means use default.
## kylin.storage.hbase.coprocessor-timeout-seconds=0
#
## clean real storage after delete operation
## if you want to delete the real storage like htable of deleting segment, you can set it to true
#kylin.storage.clean-after-delete-operation=false
#
#### JOB ###
#
## Max job retry on error, default 0: no retry
#kylin.job.retry=0
#
## Max count of concurrent jobs running
#kylin.job.max-concurrent-jobs=10
#
## The percentage of the sampling, default 100%
#kylin.job.sampling-percentage=100
#
## If true, will send email notification on job complete
##kylin.job.notification-enabled=true
##kylin.job.notification-mail-enable-starttls=true
##kylin.job.notification-mail-host=smtp.office365.com
##kylin.job.notification-mail-port=587
##kylin.job.notification-mail-username=kylin@example.com
##kylin.job.notification-mail-password=mypassword
##kylin.job.notification-mail-sender=kylin@example.com
#kylin.job.scheduler.provider.100=org.apache.kylin.job.impl.curator.CuratorScheduler
#kylin.job.scheduler.default=0
#
#### ENGINE ###
#
## Time interval to check hadoop job status
#kylin.engine.mr.yarn-check-interval-seconds=10
#
#kylin.engine.mr.reduce-input-mb=500
#
#kylin.engine.mr.max-reducer-number=500
#
#kylin.engine.mr.mapper-input-rows=1000000
#
## Enable dictionary building in MR reducer
#kylin.engine.mr.build-dict-in-reducer=true
#
## Number of reducers for fetching UHC column distinct values
#kylin.engine.mr.uhc-reducer-count=3
#
## Whether using an additional step to build UHC dictionary
#kylin.engine.mr.build-uhc-dict-in-additional-step=false
#
#
#### CUBE | DICTIONARY ###
#
#kylin.cube.cuboid-scheduler=org.apache.kylin.cube.cuboid.DefaultCuboidScheduler
#kylin.cube.segment-advisor=org.apache.kylin.cube.CubeSegmentAdvisor
#
## 'auto', 'inmem', 'layer' or 'random' for testing
#kylin.cube.algorithm=layer
#
## A smaller threshold prefers layer, a larger threshold prefers in-mem
#kylin.cube.algorithm.layer-or-inmem-threshold=7
#
## auto use inmem algorithm:
## 1, cube planner optimize job
## 2, no source record
#kylin.cube.algorithm.inmem-auto-optimize=true
#
#kylin.cube.aggrgroup.max-combination=32768
#
#kylin.snapshot.max-mb=300
#
#kylin.cube.cubeplanner.enabled=true
#kylin.cube.cubeplanner.enabled-for-existing-cube=true
#kylin.cube.cubeplanner.expansion-threshold=15.0
#kylin.cube.cubeplanner.recommend-cache-max-size=200
#kylin.cube.cubeplanner.mandatory-rollup-threshold=1000
#kylin.cube.cubeplanner.algorithm-threshold-greedy=8
#kylin.cube.cubeplanner.algorithm-threshold-genetic=23
#
#
#### QUERY ###
#
## Controls the maximum number of bytes a query is allowed to scan storage.
## The default value 0 means no limit.
## The counterpart kylin.storage.partition.max-scan-bytes sets the maximum per coprocessor.
#kylin.query.max-scan-bytes=0
#
#kylin.query.cache-enabled=true
#
## Controls extras properties for Calcite jdbc driver
## all extras properties should undder prefix "kylin.query.calcite.extras-props."
## case sensitive, default: true, to enable case insensitive set it to false
## @see org.apache.calcite.config.CalciteConnectionProperty.CASE_SENSITIVE
#kylin.query.calcite.extras-props.caseSensitive=true
## how to handle unquoted identity, defualt: TO_UPPER, available options: UNCHANGED, TO_UPPER, TO_LOWER
## @see org.apache.calcite.config.CalciteConnectionProperty.UNQUOTED_CASING
#kylin.query.calcite.extras-props.unquotedCasing=TO_UPPER
## quoting method, default: DOUBLE_QUOTE, available options: DOUBLE_QUOTE, BACK_TICK, BRACKET
## @see org.apache.calcite.config.CalciteConnectionProperty.QUOTING
#kylin.query.calcite.extras-props.quoting=DOUBLE_QUOTE
## change SqlConformance from DEFAULT to LENIENT to enable group by ordinal
## @see org.apache.calcite.sql.validate.SqlConformance.SqlConformanceEnum
#kylin.query.calcite.extras-props.conformance=LENIENT
#
## TABLE ACL
#kylin.query.security.table-acl-enabled=true
#
## Usually should not modify this
#kylin.query.interceptors=org.apache.kylin.rest.security.TableInterceptor
#
#kylin.query.escape-default-keyword=false
#
## Usually should not modify this
#kylin.query.transformers=org.apache.kylin.query.util.DefaultQueryTransformer,org.apache.kylin.query.util.KeywordDefaultDirtyHack
#
#### SECURITY ###
#
## Spring security profile, options: testing, ldap, saml
## with "testing" profile, user can use pre-defined name/pwd like KYLIN/ADMIN to login
#kylin.security.profile=testing
#
## Admin roles in LDAP, for ldap and saml
#kylin.security.acl.admin-role=admin
#
## LDAP authentication configuration
#kylin.security.ldap.connection-server=ldap://ldap_server:389
#kylin.security.ldap.connection-username=
#kylin.security.ldap.connection-password=
## When you use the customized CA certificate library for user authentication based on LDAPs, you need to configure this item.
## The value of this item will be added to the JVM parameter javax.net.ssl.trustStore.
#kylin.security.ldap.connection-truststore=
#
## LDAP user account directory;
#kylin.security.ldap.user-search-base=
#kylin.security.ldap.user-search-pattern=
#kylin.security.ldap.user-group-search-base=
#kylin.security.ldap.user-group-search-filter=(|(member={0})(memberUid={1}))
#
## LDAP service account directory
#kylin.security.ldap.service-search-base=
#kylin.security.ldap.service-search-pattern=
#kylin.security.ldap.service-group-search-base=
#
### SAML configurations for SSO
## SAML IDP metadata file location
#kylin.security.saml.metadata-file=classpath:sso_metadata.xml
#kylin.security.saml.metadata-entity-base-url=https://hostname/kylin
#kylin.security.saml.keystore-file=classpath:samlKeystore.jks
#kylin.security.saml.context-scheme=https
#kylin.security.saml.context-server-name=hostname
#kylin.security.saml.context-server-port=443
#kylin.security.saml.context-path=/kylin
#
#### SPARK ENGINE CONFIGS ###
#
## Hadoop conf folder, will export this as "HADOOP_CONF_DIR" to run spark-submit
## This must contain site xmls of core, yarn, hive, and hbase in one folder
kylin.env.hadoop-conf-dir=/usr/local/bigdata/hadoop-3.1.4/etc/hadoop
#
## Estimate the RDD partition numbers
#kylin.engine.spark.rdd-partition-cut-mb=10
#
## Minimal partition numbers of rdd
#kylin.engine.spark.min-partition=1
#
## Max partition numbers of rdd
#kylin.engine.spark.max-partition=5000
#
## Spark conf (default is in spark/conf/spark-defaults.conf)
#kylin.engine.spark-conf.spark.master=yarn
##kylin.engine.spark-conf.spark.submit.deployMode=cluster
#kylin.engine.spark-conf.spark.yarn.queue=default
#kylin.engine.spark-conf.spark.driver.memory=2G
#kylin.engine.spark-conf.spark.executor.memory=4G
#kylin.engine.spark-conf.spark.executor.instances=40
#kylin.engine.spark-conf.spark.yarn.executor.memoryOverhead=1024
#kylin.engine.spark-conf.spark.shuffle.service.enabled=true
#kylin.engine.spark-conf.spark.eventLog.enabled=true
#kylin.engine.spark-conf.spark.eventLog.dir=hdfs\:///kylin/spark-history
#kylin.engine.spark-conf.spark.history.fs.logDirectory=hdfs\:///kylin/spark-history
#kylin.engine.spark-conf.spark.hadoop.yarn.timeline-service.enabled=false
#
#### Spark conf for specific job
#kylin.engine.spark-conf-mergedict.spark.executor.memory=6G
#kylin.engine.spark-conf-mergedict.spark.memory.fraction=0.2
#
## manually upload spark-assembly jar to HDFS and then set this property will avoid repeatedly uploading jar at runtime
##kylin.engine.spark-conf.spark.yarn.archive=hdfs://namenode:8020/kylin/spark/spark-libs.jar
##kylin.engine.spark-conf.spark.io.compression.codec=org.apache.spark.io.SnappyCompressionCodec
#
## uncomment for HDP
##kylin.engine.spark-conf.spark.driver.extraJavaOptions=-Dhdp.version=current
##kylin.engine.spark-conf.spark.yarn.am.extraJavaOptions=-Dhdp.version=current
##kylin.engine.spark-conf.spark.executor.extraJavaOptions=-Dhdp.version=current
#
#
#### FLINK ENGINE CONFIGS ###
#
### Flink conf (default is in flink/conf/flink-conf.yaml)
#kylin.engine.flink-conf.jobmanager.heap.size=2G
#kylin.engine.flink-conf.taskmanager.heap.size=4G
#kylin.engine.flink-conf.taskmanager.numberOfTaskSlots=1
#kylin.engine.flink-conf.taskmanager.memory.preallocate=false
#kylin.engine.flink-conf.job.parallelism=1
#kylin.engine.flink-conf.program.enableObjectReuse=false
#kylin.engine.flink-conf.yarn.queue=
#kylin.engine.flink-conf.yarn.nodelabel=
#
#### QUERY PUSH DOWN ###
#
##kylin.query.pushdown.runner-class-name=org.apache.kylin.query.adhoc.PushDownRunnerJdbcImpl
#
##kylin.query.pushdown.update-enabled=false
##kylin.query.pushdown.jdbc.url=jdbc:hive2://sandbox:10000/default
##kylin.query.pushdown.jdbc.driver=org.apache.hive.jdbc.HiveDriver
##kylin.query.pushdown.jdbc.username=hive
##kylin.query.pushdown.jdbc.password=
#
##kylin.query.pushdown.jdbc.pool-max-total=8
##kylin.query.pushdown.jdbc.pool-max-idle=8
##kylin.query.pushdown.jdbc.pool-min-idle=0
#
#### JDBC Data Source
##kylin.source.jdbc.connection-url=
##kylin.source.jdbc.driver=
##kylin.source.jdbc.dialect=
##kylin.source.jdbc.user=
##kylin.source.jdbc.pass=
##kylin.source.jdbc.sqoop-home=
##kylin.source.jdbc.filed-delimiter=|
#
#### Livy with Kylin
##kylin.engine.livy-conf.livy-enabled=false
##kylin.engine.livy-conf.livy-url=http://LivyHost:8998
##kylin.engine.livy-conf.livy-key.file=hdfs:///path-to-kylin-job-jar
##kylin.engine.livy-conf.livy-arr.jars=hdfs:///path-to-hadoop-dependency-jar
#
#
#### Realtime OLAP ###
#
## Where should local segment cache located, for absolute path, the real path will be ${KYLIN_HOME}/${kylin.stream.index.path}
#kylin.stream.index.path=stream_index
#
## The timezone for Derived Time Column like hour_start, try set to GMT+N, please check detail at KYLIN-4010
#kylin.stream.event.timezone=
#
## Debug switch for print realtime global dict encode information, please check detail at KYLIN-4141
#kylin.stream.print-realtime-dict-enabled=false
#
## Should enable latest coordinator, please check detail at KYLIN-4167
#kylin.stream.new.coordinator-enabled=true
#
## In which way should we collect receiver's metrics info
##kylin.stream.metrics.option=console/csv/jmx
#
## When enable a streaming cube, should cousme from earliest offset or least offset
#kylin.stream.consume.offsets.latest=true
#
## The parallelism of scan in receiver side
#kylin.stream.receiver.use-threads-per-query=8
#
## How coordinator/receiver register itself into StreamMetadata, there are three option:
## 1. hostname:port, then kylin will set the config ip and port as the currentNode;
## 2. port, then kylin will get the node's hostname and append port as the currentNode;
## 3. not set, then kylin will get the node hostname address and set the hostname and defaultPort(7070 for coordinator or 9090 for receiver) as the currentNode.
##kylin.stream.node=
#
## Auto resubmit after job be discarded
#kylin.stream.auto-resubmit-after-discard-enabled=true
4、检查运行环境
Kylin 运行在 Hadoop 集群上,对各个组件的版本、访问权限及 CLASSPATH 等都有一定的要求,为了避免遇到各种环境问题,您可以运行 $KYLIN_HOME/bin/check-env.sh 脚本来进行环境检测,如果您的环境存在任何的问题,脚本将打印出详细报错信息。如果没有报错信息,代表环境适合 Kylin 运行。
cd /usr/local/bigdata/apache-kylin-3.1.3-bin-hadoop3/bin
[alanchan@server4 bin]$ check-env.sh
Retrieving hadoop conf dir...
...................................................[PASS]
KYLIN_HOME is set to /usr/local/bigdata/apache-kylin-3.1.3-bin-hadoop3
Checking HBase
...................................................[PASS]
Checking hive
...................................................[PASS]
Checking hadoop shell
...................................................[PASS]
Checking hdfs working dir
...................................................[PASS]
Retrieving Spark dependency...
Optional dependency spark not found, if you need this; set SPARK_HOME, or run bin/download-spark.sh
...................................................[PASS]
Retrieving Flink dependency...
Optional dependency flink not found, if you need this; set FLINK_HOME, or run bin/download-flink.sh
...................................................[PASS]
Retrieving kafka dependency...
Couldn't find kafka home. If you want to enable streaming pr`在这里插入代码片`ocessing, Please set KAFKA_HOME to the path which contains kafka dependencies.
...................................................[PASS]
Checking environment finished successfully. To check again, run 'bin/check-env.sh' manually.
5、启动依賴集群
1)、启动zookeeper
cd /usr/local/bigdata/apache-zookeeper-3.7.1/bin
zkServer.sh start
#驗證
zkServer.sh status
2)、启动Hadoop集群
start-all.sh
mapred --daemon start historyserver
yarn --daemon start timelineserver
#驗證
hdfs:http://server1:9870/dfshealth.html#tab-overview
yarn:http://server1:8088/cluster/apps
3)、启动HBase集群
start-hbase.sh
#驗證
http://server1:16010/master-status
4)、启动 hive
nohup /usr/local/bigdata/apache-hive-3.1.2-bin/bin/hive --service metastore > /usr/local/bigdata/apache-hive-3.1.2-bin/logs/metastore.log --hiveconf hive.root.logger=WARN,console 2>&1 &
nohup /usr/local/bigdata/apache-hive-3.1.2-bin/bin/hive --service hiveserver2 > /usr/local/bigdata/apache-hive-3.1.2-bin/logs/hiveserver2.log --hiveconf hive.root.logger=WARN,console 2>&1 &
#驗證
[alanchan@server4 bin]$ beeline
Beeline version 3.1.2 by Apache Hive
beeline> ! connect jdbc:hive2://server4:10000
Connecting to jdbc:hive2://server4:10000
Enter username for jdbc:hive2://server4:10000: alanchan
Enter password for jdbc:hive2://server4:10000: ********(rootroot)
Connected to: Apache Hive (version 3.1.2)
Driver: Hive JDBC (version 3.1.2)
Transaction isolation: TRANSACTION_REPEATABLE_READ
0: jdbc:hive2://server4:10000> show databases;
+----------------+
| database_name |
+----------------+
| default |
| test |
| testhive |
+----------------+
3 rows selected (1.388 seconds)
5)、启动 kylin
cd /usr/local/bigdata/apache-kylin-3.1.3-bin-hadoop3/bin
kylin.sh start
# 解決異常1
Retrieving hive dependency...
/usr/local/bigdata/hadoop-3.1.4/libexec/hadoop-functions.sh:行2360: HADOOP_ORG.APACHE.HADOOP.HBASE.UTIL.GETJAVAPROPERTY_USER: 错误的替换
/usr/local/bigdata/hadoop-3.1.4/libexec/hadoop-functions.sh:行2455: HADOOP_ORG.APACHE.HADOOP.HBASE.UTIL.GETJAVAPROPERTY_OPTS: 错误的替换
Hive Session ID = e55203bc-e634-4cd6-84f7-60e52c570730
Logging initialized using configuration in jar:file:/usr/local/bigdata/apache-hive-3.1.2-bin/lib/hive-common-3.1.2.jar!/hive-log4j2.properties Async: true
Hive Session ID = 37aba532-4fb3-4818-900e-c13aa05e0d72
export hive_warehouse_dir=/user/hive/warehouse
Retrieving hbase dependency...
/usr/local/bigdata/hadoop-3.1.4/libexec/hadoop-functions.sh:行2360: HADOOP_ORG.APACHE.HADOOP.HBASE.UTIL.GETJAVAPROPERTY_USER: 错误的替换
/usr/local/bigdata/hadoop-3.1.4/libexec/hadoop-functions.sh:行2455: HADOOP_ORG.APACHE.HADOOP.HBASE.UTIL.GETJAVAPROPERTY_OPTS: 错误的替换
错误: 找不到或无法加载主类 org.apache.hadoop.hbase.util.GetJavaProperty
hbase-common lib not found
#解决方案:hbase脚本 CLASSPATH中添加HBase lib目录
#停止hbase集群
# 找到这一行(大约是158行)
CLASSPATH=${CLASSPATH}:$JAVA_HOME/lib/tools.jar
# 修改为:(根据自己的路径地址)
cd /usr/local/bigdata/hbase-2.1.0/bin
vi hbase
# CLASSPATH=${CLASSPATH}:$JAVA_HOME/lib/tools.jar
CLASSPATH=${CLASSPATH}:$JAVA_HOME/lib/tools.jar:/usr/local/bigdata/hbase-2.1.0/lib/*
#保存退出,scp文件至hbase整個集群
#啓動hbase集群
#啓動kylin
#出現如下提示,表示kylin啓動成功
A new Kylin instance is started by alanchan. To stop it, run 'kylin.sh stop'
Check the log at /usr/local/bigdata/apache-kylin-3.1.3-bin-hadoop3/logs/kylin.log
Web UI is at http://server4:7070/kylin
6、驗證 kylin
http://server4:7070/kylin/login 用戶名:ADMIN
密碼:KYLIN
用戶名和密碼均需大寫
登錄成功后
以上,完成了kylin的基本介绍和单机部署的说明及验证。