nebula 算法提交spark任务时报错 NoSuchMethodError java.lang.String com.google.common.net

原创

bonelee 2024-11-25 14:56:46 ©著作权

文章标签 scala spark apache 文章分类 Redis 数据库

©著作权归作者所有：来自51CTO博客作者bonelee的原创作品，请联系作者获取转载授权，否则将追究法律责任

写在前面，根因问题是我下载了spark without hadoop版本导致，应该使用spark with hadoop 版本！！！

----------------------------------------------

参考步骤：https://docs.nebula-graph.com.cn/3.1.0/nebula-algorithm/#_2

错误：

2022-05-31 17:40:16,459 WARN  [main] util.Utils (Logging.scala:logWarning(66)) - Your hostname, bonelee-VirtualBox resolves to a loopback address: 127.0.1.1; using 10.0.2.15 instead (on interface enp0s3)
2022-05-31 17:40:16,461 WARN  [main] util.Utils (Logging.scala:logWarning(66)) - Set SPARK_LOCAL_IP if you need to bind to another address
Exception in thread "main" java.lang.NoSuchMethodError: com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V
	at org.apache.hadoop.conf.Configuration.set(Configuration.java:1357)
	at org.apache.hadoop.conf.Configuration.set(Configuration.java:1338)
	at org.apache.spark.deploy.SparkHadoopUtil$.org$apache$spark$deploy$SparkHadoopUtil$$appendS3AndSparkHadoopConfigurations(SparkHadoopUtil.scala:464)
	at org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:436)
	at org.apache.spark.deploy.SparkSubmit$$anonfun$2.apply(SparkSubmit.scala:323)
	at org.apache.spark.deploy.SparkSubmit$$anonfun$2.apply(SparkSubmit.scala:323)
	at scala.Option.getOrElse(Option.scala:121)
	at org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:323)
	at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:784)
	at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
	at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
	at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
	at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:930)
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:939)
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

提交方式：

spark-submit --master "local" \
--conf spark.app.name="g1" \
--conf spark.executor.extraLibraryPath=/home/bonelee/Desktop/nebula-algorithm/guava-14.0.jar \
--conf spark.executor.extraClassPath=/home/bonelee/Desktop/nebula-algorithm/guava-14.0.jar \
--driver-class-path /home/bonelee/Desktop/nebula-algorithm/guava-14.0.jar  \
--driver-library-path /home/bonelee/Desktop/nebula-algorithm/guava-14.0.jar \
--class com.vesoft.nebula.algorithm.Main nebula-algorithm/target/nebula-algorithm-3.0.0.jar -p /home/bonelee/Desktop/nebula-algorithm/application.conf

版本说明

Scala 2.11.12 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_333).
Spark： 2.4.8-bin-without-hadoop

hadoop：3.2.3

nebula：3.0.0

nebula-algorithm：3.0.0

application配置：

{

  # Spark relation config

  spark: {

    app: {

        name: LPA

        # spark.app.partitionNum

        partitionNum:100

    }

    master:local

  }

  data: {

    # data source. optional of nebula,csv,json

    source: nebula

    # data sink, means the algorithm result will be write into this sink. optional of nebula,csv,text

    sink: csv

    # if your algorithm needs weight

    hasWeight: false

  }

  # Nebula Graph relation config

  nebula: {

    # algo's data source from Nebula. If data.source is nebula, then this nebula.read config can be valid.

    read: {

        # Nebula metad server address, multiple addresses are split by English comma

        metaAddress: "127.0.0.1:9559"

        # Nebula space

        space: basketballplayer

        # Nebula edge types, multiple labels means that data from multiple edges will union together

        labels: ["serve"]

        # Nebula edge property name for each edge type, this property will be as weight col for algorithm.

        # Make sure the weightCols are corresponding to labels.

        weightCols: ["start_year"]

    }

    # algo result sink into Nebula. If data.sink is nebula, then this nebula.write config can be valid.

    write:{

        # Nebula graphd server address， multiple addresses are split by English comma

        graphAddress: "127.0.0.1:9669"

        # Nebula metad server address, multiple addresses are split by English comma

        metaAddress: "127.0.0.1:9559"

        user:root

        pswd:nebula

        # Nebula space name

        space:nb

        # Nebula tag name, the algorithm result will be write into this tag

        tag:pagerank

        # algorithm result is insert into new tag or update to original tag. type: insert/update

        type:insert

    }

  }

  local: {

    # algo's data source from Nebula. If data.source is csv or json, then this local.read can be valid.

read:{

      filePath: "file:///tmp/algo_edge.csv"

      srcId:"src"

      # dstId column

      dstId:"dst"

      # weight column

      weight: "weight"

      # if csv file has header

      header: true

      # csv file's delimiter

      delimiter:","

}

    # algo result sink into local file. If data.sink is csv or text, then this local.write can be valid.

    write:{

        resultPath:/tmp/count

    }

  }

  algorithm: {

    # the algorithm that you are going to execute，pick one from [pagerank, louvain, connectedcomponent,

    # labelpropagation, shortestpaths, degreestatic, kcore, stronglyconnectedcomponent, trianglecount,

    # betweenness, graphtriangleCount, clusteringcoefficient, bfs, hanp, closeness, jaccard, node2vec]

    executeAlgo: graphtrianglecount

    # PageRank parameter

    pagerank: {

        maxIter: 10

        resetProb: 0.15  # default 0.15

    }

    # Louvain parameter

    louvain: {

        maxIter: 20

        internalIter: 10

        tol: 0.5

   }

   # connected component parameter.

    connectedcomponent: {

        maxIter: 20

   }

   # LabelPropagation parameter

    labelpropagation: {

        maxIter: 20

   }

   # ShortestPaths parameter

    shortestpaths: {

        # several vertices to compute the shortest path to all vertices.

        landmarks: "1"

   }

    # Vertex degree statistics parameter

    degreestatic: {}

   # KCore parameter

   kcore:{

        maxIter:10

        degree:1

   }

   # Trianglecount parameter

   trianglecount:{}

   # graphTriangleCount parameter

   graphtrianglecount:{}

   # Betweenness centrality parameter. maxIter parameter means the max times of iterations.

   betweenness:{

        maxIter:5

   }

   # Clustering Coefficient parameter. The type parameter has two choice, local or global

   # local type will compute the clustering coefficient for each vertex, and print the average coefficient for graph.

   # global type just compute the graph's clustering coefficient.

   clusteringcoefficient:{

        type: local

   }

   # ClosenessAlgo parameter

   closeness:{}

   # BFS parameter

   bfs:{

       maxIter:5

       root:"10"

   }

   # HanpAlgo parameter

   hanp:{

       hopAttenuation:0.1

       maxIter:10

       preference:1.0

   }

   #Node2vecAlgo parameter

   node2vec:{

       maxIter: 10,

       lr: 0.025,

       dataNumPartition: 10,

       modelNumPartition: 10,

       dim: 10,

       window: 3,

       walkLength: 5,

       numWalks: 3,

       p: 1.0,

       q: 1.0,

       directed: false,

       degree: 30,

       embSeparate: ",",

       modelPath: "hdfs://127.0.0.1:9000/model"

   }

   # JaccardAlgo parameter

   jaccard:{

       tol: 1.0

   }

 }

}

如果将上述数据源nebula修改为csv则运行不会出错！！！

------------------------------------------------------------------------------------------------------------------------------

使用nebula graph 算法提交spark任务时报错：读取数据源使用的nebula！！！

Exception in thread "main" java.lang.NoSuchMethodError: 'java.lang.String com.google.common.net.HostAndPort.getHostText()'
at com.vesoft.nebula.connector.NebulaOptions$$anonfun$getMetaAddress$1.apply(NebulaOptions.scala:186)
at com.vesoft.nebula.connector.NebulaOptions$$anonfun$getMetaAddress$1.apply(NebulaOptions.scala:183)
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
at com.vesoft.nebula.connector.NebulaOptions.getMetaAddress(NebulaOptions.scala:183)
at com.vesoft.nebula.connector.reader.NebulaSourceReader.getSchema(NebulaSourceReader.scala:45)
at com.vesoft.nebula.connector.reader.NebulaSourceReader.readSchema(NebulaSourceReader.scala:30)
at org.apache.spark.sql.execution.datasources.v2.DataSourceV2Relation$.create(DataSourceV2Relation.scala:175)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:223)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:186)
at com.vesoft.nebula.connector.connector.package$NebulaDataFrameReader.loadEdgesToDF(package.scala:172)
at com.vesoft.nebula.algorithm.reader.NebulaReader$$anonfun$read$1.apply$mcVI$sp(DataReader.scala:52)
at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:160)
at com.vesoft.nebula.algorithm.reader.NebulaReader.read(DataReader.scala:38)
at com.vesoft.nebula.algorithm.Main$.createDataSource(Main.scala:118)
at com.vesoft.nebula.algorithm.Main$.main(Main.scala:84)
at com.vesoft.nebula.algorithm.Main.main(Main.scala)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:568)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:855)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:930)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:939)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

提交方式：

/home/bonelee/spark-2.4.8-bin-without-hadoop/bin/spark-submit --master "local" \
--class com.vesoft.nebula.algorithm.Main nebula-algorithm/target/nebula-algorithm-3.0.0.jar -p /home/bonelee/Desktop/nebula-algorithm/application.conf
 
#--driver-class-path /home/bonelee/Desktop/nebula-algorithm/guava-14.0.jar  \
#--driver-library-path /home/bonelee/Desktop/nebula-algorithm/guava-14.0.jar \
#--conf spark.executor.extraLibraryPath=/home/bonelee/Desktop/nebula-algorithm/guava-14.0.jar \
#--conf spark.executor.extraClassPath=/home/bonelee/Desktop/nebula-algorithm/guava-14.0.jar \

即便是使用：文中说的方法，使用guava-14.0.jar依然会报错！

真郁闷，折腾了一下午。。。没有搞定。。。

绕过该问题的解决方法那就是：

修改application.conf文件数据源：

data: {
    # data source. optional of nebula,csv,json
    source: nebula
    # data sink, means the algorithm result will be write into this sink. optional of nebula,csv,text
    sink: csv
    # if your algorithm needs weight
    hasWeight: false
  }

将上面的nebula修改为csv，同时配置下：

local: {
    # algo's data source from Nebula. If data.source is csv or json, then this local.read can be valid.
   read:{
        filePath: "file:///tmp/algo_edge.csv"
        srcId:"src"
        # dstId column
        dstId:"dst"
        # weight column
        weight: "weight"
        # if csv file has header
        header: true
        # csv file's delimiter
        delimiter:","
    }



    # algo result sink into local file. If data.sink is csv or text, then this local.write can be valid.
    write:{
        resultPath:/tmp/count
    }
  }

algo_edge.csv文件内容：

src,dst,weight
1,1,5.0
1,2,1.0
1,3,5.0
1,4,1.0
2,1,5.0
2,2,1.0
2,3,5.0
2,4,1.0
3,1,1.0
3,2,5.0
3,3,1.0
3,4,5.0
4,1,1.0
4,2,5.0
4,3,1.0
4,4,5.0

这个时候，是可以正常输出结果的！

bonelee@bonelee-VirtualBox:~/Desktop/nebula-algorithm-2.6$ cat /tmp/count/part-00000-7ae75bc1-80e7-4469-9dda-268a8036db09-c000.csv 
count
4

----------------------------------------------------

20220613解决方法，是在windows下spark 2.4.7跑通的，我要跪了。。。

关键步骤：

1、linux下修改配置文件

将etc/下面的nebula-graphd.conf，nebula-storaged.conf，nebula-metad.conf里的127.0.0.1修改成外部访问地址，我的是：8.35.24.181

########## networking ##########
# Comma separated Meta Server Addresses
--meta_server_addrs=8.35.24.181:9559
# Local IP used to identify the nebula-graphd process.
# Change it to an address other than loopback if the service is distributed or
# will be accessed remotely.
--local_ip=8.35.24.181

然后记得add hosts，看到上线：　　

(root@nebula) [algo_edge]> show hosts
+---------------+------+-----------+-----------+--------------+----------------------+-----------------------------------+---------+
| Host          | Port | HTTP port | Status    | Leader count | Leader distribution  | Partition distribution            | Version |
+---------------+------+-----------+-----------+--------------+----------------------+-----------------------------------+---------+
| "127.0.0.1"   | 9779 | 19669     | "OFFLINE" | 0            | "No valid partition" | "basketballplayer:10, cadets:100" | "3.1.0" |
| "8.35.24.181" | 9779 | 19669     | "ONLINE"  | 10           | "algo_edge:10"       | "algo_edge:10"                    | "3.1.0" |
+---------------+------+-----------+-----------+--------------+----------------------+-----------------------------------+---------+
Got 2 rows (time spent 3122/3893 us)

Mon, 13 Jun 2022 18:41:25 CST

2、图数据库里插入数据：

drop space algo_edge;
create space algo_edge(partition_num=10,replica_factor=1,vid_type=INT64);
:sleep 20
use algo_edge;
create tag player(id int);
create edge serve(serve_year int);
:sleep 20


insert vertex player(id) values 1:(1);
insert vertex player(id) values 2:(2);
insert vertex player(id) values 3:(3);
insert vertex player(id) values 4:(4);

insert edge serve(serve_year) values 1->1:(5);
insert edge serve(serve_year) values 1->2:(1);
insert edge serve(serve_year) values 1->3:(5);
insert edge serve(serve_year) values 1->4:(1);
insert edge serve(serve_year) values 2->1:(5);
insert edge serve(serve_year) values 2->2:(1);
insert edge serve(serve_year) values 2->3:(5);
insert edge serve(serve_year) values 2->4:(1);
insert edge serve(serve_year) values 3->1:(1);
insert edge serve(serve_year) values 3->2:(5);
insert edge serve(serve_year) values 3->3:(1);
insert edge serve(serve_year) values 3->4:(5);
insert edge serve(serve_year) values 4->1:(1);
insert edge serve(serve_year) values 4->2:(5);
insert edge serve(serve_year) values 4->3:(1);
insert edge serve(serve_year) values 4->4:(5);

3、修改app.conf, 尤其是nebula连接那部分：

{
 
  # Spark relation config
 
  spark: {
 
    app: {
 
        name: LPA
 
        # spark.app.partitionNum
 
        partitionNum:100
 
    }
 
    master:local
 
  }
 
  data: {
 
    # data source. optional of nebula,csv,json
 
    source: nebula
 
    # data sink, means the algorithm result will be write into this sink. optional of nebula,csv,text
 
    sink: csv
 
    # if your algorithm needs weight
 
    hasWeight: false
 
  }
 
  # Nebula Graph relation config
 
  nebula: {
 
    # algo's data source from Nebula. If data.source is nebula, then this nebula.read config can be valid.
 
    read: {
 
        # Nebula metad server address, multiple addresses are split by English comma
 
        metaAddress: "8.35.24.181:9559"
 
        # Nebula space
 
        space: algo_edge
 
        # Nebula edge types, multiple labels means that data from multiple edges will union together
 
        labels: ["serve"]
 
        # Nebula edge property name for each edge type, this property will be as weight col for algorithm.
 
        # Make sure the weightCols are corresponding to labels.
 
        weightCols: ["serve_year"]
 
    }
 
    # algo result sink into Nebula. If data.sink is nebula, then this nebula.write config can be valid.
 
    write:{
 
        # Nebula graphd server address， multiple addresses are split by English comma
 
        graphAddress: "127.0.0.1:9669"
 
        # Nebula metad server address, multiple addresses are split by English comma
 
        metaAddress: "127.0.0.1:9559"
 
        user:root
 
        pswd:nebula
 
        # Nebula space name
 
        space:nb
 
        # Nebula tag name, the algorithm result will be write into this tag
 
        tag:pagerank
 
        # algorithm result is insert into new tag or update to original tag. type: insert/update
 
        type:insert
 
    }
 
  }
 
  local: {
 
    # algo's data source from Nebula. If data.source is csv or json, then this local.read can be valid.
 
read:{
 
      filePath: "file:///D:/tmp/algo_edge.csv"
 
      srcId:"src"
 
      # dstId column
 
      dstId:"dst"
 
      # weight column
 
      weight: "weight"
 
      # if csv file has header
 
      header: true
 
      # csv file's delimiter
 
      delimiter:","
 
}
 
    # algo result sink into local file. If data.sink is csv or text, then this local.write can be valid.
 
    write:{
 
        resultPath: "file:///D:/tmp/result0613002"
 
    }
 
  }
 
  algorithm: {
 
    # the algorithm that you are going to execute，pick one from [pagerank, louvain, connectedcomponent,
 
    # labelpropagation, shortestpaths, degreestatic, kcore, stronglyconnectedcomponent, trianglecount,
 
    # betweenness, graphtriangleCount, clusteringcoefficient, bfs, hanp, closeness, jaccard, node2vec]
 
    executeAlgo: graphtrianglecount
 
    # PageRank parameter
 
    pagerank: {
 
        maxIter: 10
 
        resetProb: 0.15  # default 0.15
 
    }
 
    # Louvain parameter
 
    louvain: {
 
        maxIter: 20
 
        internalIter: 10
 
        tol: 0.5
 
   }
 
   # connected component parameter.
 
    connectedcomponent: {
 
        maxIter: 20
 
   }
 
   # LabelPropagation parameter
 
    labelpropagation: {
 
        maxIter: 20
 
   }
 
   # ShortestPaths parameter
 
    shortestpaths: {
 
        # several vertices to compute the shortest path to all vertices.
 
        landmarks: "1"
 
   }
 
    # Vertex degree statistics parameter
 
    degreestatic: {}
 
   # KCore parameter
 
   kcore:{
 
        maxIter:10
 
        degree:1
 
   }
 
   # Trianglecount parameter
 
   trianglecount:{}
 
   # graphTriangleCount parameter
 
   graphtrianglecount:{}
 
   # Betweenness centrality parameter. maxIter parameter means the max times of iterations.
 
   betweenness:{
 
        maxIter:5
 
   }
 
   # Clustering Coefficient parameter. The type parameter has two choice, local or global
 
   # local type will compute the clustering coefficient for each vertex, and print the average coefficient for graph.
 
   # global type just compute the graph's clustering coefficient.
 
   clusteringcoefficient:{
 
        type: local
 
   }
 
   # ClosenessAlgo parameter
 
   closeness:{}
 
   # BFS parameter
 
   bfs:{
 
       maxIter:5
 
       root:"10"
 
   }
 
   # HanpAlgo parameter
 
   hanp:{
 
       hopAttenuation:0.1
 
       maxIter:10
 
       preference:1.0
 
   }
 
   #Node2vecAlgo parameter
 
   node2vec:{
 
       maxIter: 10,
 
       lr: 0.025,
 
       dataNumPartition: 10,
 
       modelNumPartition: 10,
 
       dim: 10,
 
       window: 3,
 
       walkLength: 5,
 
       numWalks: 3,
 
       p: 1.0,
 
       q: 1.0,
 
       directed: false,
 
       degree: 30,
 
       embSeparate: ",",
 
       modelPath: "hdfs://127.0.0.1:9000/model"
 
   }
 
   # JaccardAlgo parameter
 
   jaccard:{
 
       tol: 1.0
 
   }
 
 }
 
}

4、然后运行spark 就可以成功连接nebula数据源并运行了！！！

spark-submit --master "local" --class com.vesoft.nebula.algorithm.Main nebula-algorithm-3.0.0.jar -p c.conf

注意，我是安装的带hadoop的版本！

D:\app\spark-2.4.7-bin-hadoop2.7\bin\spark-submit　　

上一篇：RAG中late chunking的实验效果测试（续）

下一篇：RAG实验：块大小分割实验、矢量存储；FAISS 与 Chroma、向量存储和 Top k、向量存储中的距离度量

提问和评论都可以，用心的回复会被更多人看到评论

发布评论

相关文章

官方博客	全部文章	热门标签	班级博客
了解我们	网站地图	意见反馈

鸿蒙开发者社区	51CTO学堂
51CTO	软考资讯