版权

在使用pyspark的时候,遇到了如下问题:

Could not serialize object: IndexError: tuple index out of range

代码如下:

from pyspark import SparkContext
import os
import logging
logging.basicConfig(level=logging.ERROR)

from pyspark.sql import SparkSession,Row

ss = SparkSession.builder.appName("rdd").master("local[2]").getOrCreate()
# user_df = ss.createDataFrame([(1,'Tom',22),(2,'Lucy',18),(3,'Nick',21)],['id','name','age'])
# print(user_df.show())
Person = Row("id", "name", "age", "weight")
user_row_df = ss.createDataFrame([Person(1,"tom",21,75.5), Person(2, "lucy", 18, 50.0)])
print(user_row_df.show())

报错信息如下:

解决pyspark报错Could not serialize object: IndexError: tuple index out of range_sql

错误原因:

Python版本过高,更换为较低的版本即可解决问题(推荐3.7,3.8版本)

解决pyspark报错Could not serialize object: IndexError: tuple index out of range_Python_02