SparkSql读取hive表tblproperties异常
转载
1
问题描述
集群环境
- sparksql读取Parquet 格式的hive表报错
- hive的parquet表,hive和impala读取正常,使用spark-sql读取则报错
异常信息
com.fasterxml.jackson.core.JsonParseException: Unexpected end-of-input within/between Object entries
at [Source: (String)"{"type":"struct","fields":[{"name":"timestamp","type":"string","nullable":true,"metadata":{"HIVE_TYPE_STRING":"string"}},{"name":"xxx","type":"string","nullable":true,"metadata":{"HIVE_TYPE_STRING":"string"}},{"name":"xxx","type":"string","nullable":true,"; line: 1, column: 513]
at com.fasterxml.jackson.core.JsonParser._constructError(JsonParser.java:1804)
at com.fasterxml.jackson.core.json.ReaderBasedJsonParser._skipAfterComma2(ReaderBasedJsonParser.java:2323)
at com.fasterxml.jackson.core.json.ReaderBasedJsonParser._skipComma(ReaderBasedJsonParser.java:2293)
at com.fasterxml.jackson.core.json.ReaderBasedJsonParser.nextToken(ReaderBasedJsonParser.java:664)
at org.json4s.jackson.JValueDeserializer.deserialize(JValueDeserializer.scala:47)
at org.json4s.jackson.JValueDeserializer.deserialize(JValueDeserializer.scala:39)
at org.json4s.jackson.JValueDeserializer.deserialize(JValueDeserializer.scala:32)
at org.json4s.jackson.JValueDeserializer.deserialize(JValueDeserializer.scala:46)
at org.json4s.jackson.JValueDeserializer.deserialize(JValueDeserializer.scala:39)
at com.fasterxml.jackson.databind.ObjectReader._bindAndClose(ObjectReader.java:1611)
at com.fasterxml.jackson.databind.ObjectReader.readValue(ObjectReader.java:1219)
at org.json4s.jackson.JsonMethods$class.parse(JsonMethods.scala:25)
at org.json4s.jackson.JsonMethods$.parse(JsonMethods.scala:55)
at org.apache.spark.sql.types.DataType$.fromJson(DataType.scala:127)
at org.apache.spark.sql.hive.HiveExternalCatalog$.org$apache$spark$sql$hive$HiveExternalCatalog$$getSchemaFromTableProperties(HiveExternalCatalog.scala:1382)
at org.apache.spark.sql.hive.HiveExternalCatalog.restoreDataSourceTable(HiveExternalCatalog.scala:845)
at org.apache.spark.sql.hive.HiveExternalCatalog.org$apache$spark$sql$hive$HiveExternalCatalog$$restoreTableMetadata(HiveExternalCatalog.scala:765)
at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$getTable$1.apply(HiveExternalCatalog.scala:734)
at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$getTable$1.apply(HiveExternalCatalog.scala:734)
2
问题原因
从报错来看,该hive表的tblproperites有问题,tblproperites中的json字段无法正常解析,导致SparkSql读取该表出错。Hive和Impala在读取表的时候不会去解析tblproperites,因此正常。
3
问题解决
- tblproperites不全的问题,应该是hive存储tblproperites的表,参数字段存在截断,因此找到metastore库中的TABLE_PARAMS表,检查PARAM_VALUE字段,发现该字段的长度仅为256,找到问题
- 将PARAM_VALUE的长度修改为8000,问题解决