1.启动Spark会有:​​WARN yarn.Client: Neither spark.yarn.jars nor spark.yarn.archive is set​​,这里是上述的两个用来节省上传jars到hdfs的选项都没有设置,会执行本地上传。请看 ​​问题及解决​​

2.执行Spark.sql出现​​Truncated the string representation of a plan since it was too large. This behavior can be adjusted by setting 'spark.debug.maxToStringFields' in SparkEnv.conf.​​这里需要设定一下字符串的最大长度。

SparkSession
.builder()
.master("local[4]")
.appName("report")
.config("spark.debug.maxToStringFields", "100")
.getOrCreate();

参考 ​​spark.debug.maxToStringFields​​

3.执行Spark.sql​​metastore.ObjectStore: Version information found in metastore differs 2.3.0 from expected​​​需要在​​hive-site.xml​​​中配置​​hive.metastore.uris​​,配置的是连接metastore的信息。

<property>
<name>hive.metastore.uris</name>
<value>thrift://master:9083</value>
<description>metastore url</description>
</property>

参考 ​​metastore.ObjectStore​​ 4.​​org.apache.spark.SparkException: Job aborted due to stage failure: Task not serializable: java.io.NotSerializableException: org.apache.spark.sql.TypedColumn​​。通常因为在map、filter等的参数使用了外部的变量,但是这个变量不能序列化。特别是当引用了某个类(经常是当前类)的成员函数或变量时,会导致这个类的所有成员(整个类)都需要支持序列化。目前的解决方法参考 ​​spark not serializable异常分析及解决方案​

5.spark关闭一个slave节点​​no org.apache.spark.deploy.worker.Worker to stop​​,这是因为spark保存进程id的文件被定时清理了,可以查看下默认路径是否存在

  • worker节点 ​​/tmp/spark-${username}-org.apache.spark.deploy.worker.Worker-1.pid​
  • master节点 ​​/tmp/spark-${username}-org.apache.spark.deploy.master.Master-1.pid​

如果找不到,首先通过jps找到当前的master或者worker的pid,然后写入该文件,这样​​stop-slave.sh​​​或者​​stop-all.sh​​​就可以成功执行。
​​​Spark集群worker无法停止的原因分析和解决​

6.修改​​yarn-site.xml​​过程中遇到:​​hadoop 不在 sudoers 文件中。此事将被报告​​

7.启动spark后,提示​​JAVA_HOME is not set​​​,然后查看8080端口发现​​worker​​​没起来,然后在​​/conf/spark-env.sh​​​添加​​JAVA_HOME​​变量后,一切正常。

8.textFile = spark.read.text(“README.md”)​​测试时报错​​​‘Path does not exist: hdfs://master:8020/user/hadoop/README.md;’​​,关于这个问题有两种解决办法,一是尝试修改目录地址​​​"file:///home/hadoop/software/spark/spark-2.4.4-bin-hadoop2.7/README.md"​​其中"///"是文件root目录,第二种方法是修改​​​spark-env.sh`,具体看​​How to load local file in sc.textFile, instead of HDFS​

9.worker连接不上master

20/11/03 21:10:04 INFO worker.Worker: Retrying connection to master (attempt # 2)
20/11/03 21:10:04 INFO worker.Worker: Connecting to master spark-65-145:7077...
20/11/03 21:10:04 WARN worker.Worker: Failed to connect to master spark-65-145:7077
org.apache.spark.SparkException: Exception thrown in awaitResult
at org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:77)
at org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:75)
at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
at scala.PartialFunction$OrElse.apply(PartialFunction.scala:167)
at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:83)
at org.apache.spark.rpc.RpcEnv.setupEndpointRefByURI(RpcEnv.scala:100)
at org.apache.spark.rpc.RpcEnv.setupEndpointRef(RpcEnv.scala:108)
at org.apache.spark.deploy.worker.Worker$$anonfun$org$apache$spark$deploy$worker$Worker$$tryRegisterAllMasters$1$$anon$1.run(Worker.scala:218)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: Failed to connect to spark-65-145:7077
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:228)
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:179)
at org.apache.spark.rpc.netty.NettyRpcEnv.createClient(NettyRpcEnv.scala:197)
at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:191)
at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:187)
... 4 more
Caused by: java.nio.channels.UnresolvedAddressException
at sun.nio.ch.Net.checkAddress(Net.java:121)
at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:617)
at io.netty.channel.socket.nio.NioSocketChannel.doConnect(NioSocketChannel.java:242)
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.connect(AbstractNioChannel.java:205)
at io.netty.channel.DefaultChannelPipeline$HeadContext.connect(DefaultChannelPipeline.java:1226)
at io.netty.channel.AbstractChannelHandlerContext.invokeConnect(AbstractChannelHandlerContext.java:550)
at io.netty.channel.AbstractChannelHandlerContext.connect(AbstractChannelHandlerContext.java:535)
at io.netty.channel.ChannelOutboundHandlerAdapter.connect(ChannelOutboundHandlerAdapter.java:47)
at io.netty.channel.AbstractChannelHandlerContext.invokeConnect(AbstractChannelHandlerContext.java:550)
at io.netty.channel.AbstractChannelHandlerContext.connect(AbstractChannelHandlerContext.java:535)
at io.netty.channel.ChannelDuplexHandler.connect(ChannelDuplexHandler.java:50)
at io.netty.channel.AbstractChannelHandlerContext.invokeConnect(AbstractChannelHandlerContext.java:550)
at io.netty.channel.AbstractChannelHandlerContext.connect(AbstractChannelHandlerContext.java:535)
at io.netty.channel.AbstractChannelHandlerContext.connect(AbstractChannelHandlerContext.java:517)
at io.netty.channel.DefaultChannelPipeline.connect(DefaultChannelPipeline.java:970)
at io.netty.channel.AbstractChannel.connect(AbstractChannel.java:215)
at io.netty.bootstrap.Bootstrap$2.run(Bootstrap.java:166)
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:408)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:455)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:140)
at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:144)
... 1 more

可以看出是域名解析的问题
看到一篇博客评论解决了这个问题
一般情况下hosts中对于主机名, 都会绑定到127.0.0.1上, 这就导致spark在启动7077端口的时候, 会绑定在127.0.0.1:7077上, 这样其它主机就没有办法连接上来了. 2. 可以在spark-env.sh中进行规避. 就是使用SPARK_MASTER_HOST属性. 设置好后, spark就会以这个地址进行绑定.
​关于Spark报错不能连接到Server的解决办法​​