正文

| ShortType | Short |
 | IntegerType | Int |
 | LongType | Long |
 | FloatType | Float |
 | DoubleType | Double |
 | DecimalType | scala.math.BigDecimal |
 | StringType | String |
 | BinaryType | Array[Byte] |
 | BooleanType | Boolean |
 | TimestampType | java.sql.Timestamp |
 | DateType | java.sql.Date |
 | ArrayType | scala.collection.Seq |
 | MapType | scala.collection.Map |
 | StructType | org.apache.spark.sql.Row |
 | StructField | The value type in Scala of the data type of this field (For example, Int for a StructField with the data type IntegerType) |

Spark SQL数据类型转换案例

一句话描述:调用Column类的cast方法

如何获取Column类

这个之前写过

df("columnName")            // On a specific `df` DataFrame.
col("columnName")           // A generic column not yet associated with a DataFrame.
col("columnName.field")     // Extracting a struct field
col("`a.column.with.dots`") // Escape `.` in column names.
$"columnName"               // Scala short hand for a named column.
测试数据准备
1,tom,23
2,jack,24
3,lily,18
4,lucy,19
spark入口代码
val spark = SparkSession
      .builder()
      .appName("test")
      .master("local[*]")
      .getOrCreate()
测试默认数据类型
spark.read.
      textFile("./data/user")
      .map(_.split(","))
      .map(x => (x(0), x(1), x(2)))
      .toDF("id", "name", "age")
      .dtypes
      .foreach(println)

结果:

(id,StringType)
(name,StringType)
(age,StringType)

说明默认都是StringType类型

把数值型的列转为IntegerType
import spark.implicits._
    spark.read.
      textFile("./data/user")
      .map(_.split(","))
      .map(x => (x(0), x(1), x(2)))
      .toDF("id", "name", "age")
      .select($"id".cast("int"), $"name", $"age".cast("int"))
      .dtypes
      .foreach(println)

结果:

(id,IntegerType)
(name,StringType)
(age,IntegerType)
Column类cast方法的两种重载