正文
| ShortType | Short |
| IntegerType | Int |
| LongType | Long |
| FloatType | Float |
| DoubleType | Double |
| DecimalType | scala.math.BigDecimal |
| StringType | String |
| BinaryType | Array[Byte] |
| BooleanType | Boolean |
| TimestampType | java.sql.Timestamp |
| DateType | java.sql.Date |
| ArrayType | scala.collection.Seq |
| MapType | scala.collection.Map |
| StructType | org.apache.spark.sql.Row |
| StructField | The value type in Scala of the data type of this field (For example, Int for a StructField with the data type IntegerType) |
Spark SQL数据类型转换案例
一句话描述:调用Column类的cast方法
如何获取Column类
这个之前写过
df("columnName") // On a specific `df` DataFrame.
col("columnName") // A generic column not yet associated with a DataFrame.
col("columnName.field") // Extracting a struct field
col("`a.column.with.dots`") // Escape `.` in column names.
$"columnName" // Scala short hand for a named column.
测试数据准备
1,tom,23
2,jack,24
3,lily,18
4,lucy,19
spark入口代码
val spark = SparkSession
.builder()
.appName("test")
.master("local[*]")
.getOrCreate()
测试默认数据类型
spark.read.
textFile("./data/user")
.map(_.split(","))
.map(x => (x(0), x(1), x(2)))
.toDF("id", "name", "age")
.dtypes
.foreach(println)
结果:
(id,StringType)
(name,StringType)
(age,StringType)
说明默认都是StringType类型
把数值型的列转为IntegerType
import spark.implicits._
spark.read.
textFile("./data/user")
.map(_.split(","))
.map(x => (x(0), x(1), x(2)))
.toDF("id", "name", "age")
.select($"id".cast("int"), $"name", $"age".cast("int"))
.dtypes
.foreach(println)
结果:
(id,IntegerType)
(name,StringType)
(age,IntegerType)
Column类cast方法的两种重载