faqs = pd.read_csv('./data/FAQ.csv', sep='\t').iloc[:, 1:]
faqs
# In[3]
faqs
# In[3]
# 切分数据
faqs_len = len(faqs)
print('len(faqs):', faqs_len)
X_train, X_dev_test, y_train, y_dev_test = \
train_test_split(faqs['question'].to_list(), faqs['label'].to_list(), test_size=0.4, random_state=6, stratify=faqs['label'].to_list())
X_dev, X_test, y_dev, y_test = \
train_test_split(X_dev_test, y_dev_test, test_size=0.5, random_state=6, stratify=y_dev_test)
print('train: ', len(X_train), len(y_train))
print('dev: ', len(X_dev), len(y_dev))
print('test: ', len(X_test), len(y_test))
# In[3]
from sklearn.model_selection import train_test_split
# 存放train数据
X_train_DataFrame = pd.DataFrame(X_train, columns=['question'])
y_train_DataFrame = pd.DataFrame(y_train, columns=['label'])
train_all = pd.concat([X_train_DataFrame, y_train_DataFrame], axis=1)
train_all.to_csv('./data/train.csv', sep='\t')
# In[4]
# 存放dev数据
X_dev_DataFrame = pd.DataFrame(X_dev, columns=['question'])
y_dev_DataFrame = pd.DataFrame(y_dev, columns=['label'])
dev_all = pd.concat([X_dev_DataFrame, y_dev_DataFrame], axis=1)
dev_all.to_csv('./data/dev.csv', sep='\t')
# In[4]
# 存放test数据
X_test_DataFrame = pd.DataFrame(X_test, columns=['question'])
y_test_DataFrame = pd.DataFrame(y_test, columns=['label'])
test_all = pd.concat([X_test_DataFrame, y_test_DataFrame], axis=1)
test_all.to_csv('./data/test.csv', sep='\t')
切分数据集
转载本文章为转载内容,我们尊重原作者对文章享有的著作权。如有内容错误或侵权问题,欢迎原作者联系我们进行内容更正或删除文章。
上一篇:Pytorch实现CVAE

提问和评论都可以,用心的回复会被更多人看到
评论
发布评论
相关文章
-
spark 切分数据 sparkstage划分
本节课的内容1. Job Stage的划分算法2. Task最佳计算位置算法 一、Stage划分算法 由于Spark的算子构建一般都是链式的,这就涉及了要如何进行这些链式计算,Spark的策
spark 切分数据 spark scala 大数据 ide