现在,假设我们有这样一个文本数据:build_log.txt
time: 20180417 05:15:55
version: 1.0.266.0
server: Prd
preview: PreviewLianXiang
platform: android
channel: lianxiang
======================================================
time: 20180417 04:57:15
version: 1.0.266.0
server: Prd
preview: PreviewCulture
platform: android
channel: wc
======================================================
time: 20180417 01:52:21
version: 1.0.265.0
server: Prd
preview: PreviewYouPinWei
platform: android
channel: youpinwei
======================================================
time: 20180417 02:00:47
version: 1.0.265.0
server: Prd
preview: PreviewZhaomi360
platform: android
channel: zhaomi360
======================================================
time: 20180417 01:56:12
version: 1.0.265.0
server: Prd
preview: PreviewCulture
platform: android
channel: wc
======================================================
time: 20180417 01:58:46
version: 1.0.265.0
server: Prd
preview: PreviewMaoZhua
platform: android
channel: maozhua
======================================================
time: 20180417 02:05:58
version: 1.0.265.0
server: Prd
preview: PreviewLianXiang
platform: android
channel: lianxiang
======================================================
time: 20180417 02:02:38
version: 1.0.265.0
server: Prd
preview: PreviewHaoYouKuaiBao
platform: android
channel: haoyoukuaibao
======================================================
time: 20180417 11:45:36
version: 1.0.264.0
server: Dev
preview: None
platform: android
channel: lianxiang
======================================================
time: 20180417 10:56:04
version: 1.0.263.0
server: Dev
preview: None
platform: android
channel: lianxiang
======================================================
time: 20180416 02:44:15
version: 1.0.262.0
server: Prd
preview: None
platform: android
channel: zhaomi360
======================================================
time: 20180416 02:42:40
version: 1.0.262.0
server: Prd
preview: None
platform: android
channel: youpinwei
我们想把他转成excel可以做数据分析,像这样:build_log.csv
time,version,server,preview,platform,channel
20180417 05:15:55,1.0.266.0,Prd,PreviewLianXiang,android,lianxiang
20180417 04:57:15,1.0.266.0,Prd,PreviewCulture,android,wc
20180417 01:52:21,1.0.265.0,Prd,PreviewYouPinWei,android,youpinwei
20180417 02:00:47,1.0.265.0,Prd,PreviewZhaomi360,android,zhaomi360
20180417 01:56:12,1.0.265.0,Prd,PreviewCulture,android,wc
20180417 01:58:46,1.0.265.0,Prd,PreviewMaoZhua,android,maozhua
20180417 02:05:58,1.0.265.0,Prd,PreviewLianXiang,android,lianxiang
20180417 02:02:38,1.0.265.0,Prd,PreviewHaoYouKuaiBao,android,haoyoukuaibao
20180417 11:45:36,1.0.264.0,Dev,None,android,lianxiang
20180417 10:56:04,1.0.263.0,Dev,None,android,lianxiang
20180416 02:44:15,1.0.262.0,Prd,None,android,zhaomi360
20180416 02:42:40,1.0.262.0,Prd,None,android,youpinwei
用excel打开,就是这样子
转换的python脚本如下:
import re
def read_file(f_name):
f=open(f_name,'r')
txt=f.read()
f.close()
return txt
def convert_txt_to_dic(txt):
ls=re.split(r'==+',txt)
res=[]
for item in ls:
item=item.strip()
lines=item.splitlines()
item_map={}
for line in lines:
index=line.find(":")
key=line[:index].strip()
value=line[index+1:].strip()
#print("key: %s, value: %s"%(key,value))
item_map[key]=value
res.append(item_map)
return res
def format_csv_item(txt):
index=txt.find(",")
if index >= 0:
return "\"%s\""%(txt)
return txt
def convert_dic_to_csv(res_ls,head):
#print(res_ls)
ls=[]
head_keys=head.split(',')
ls.append(head)
for item in res_ls:
#映射
record = map(lambda k:format_csv_item(item.get(k) or ""),head_keys)
#list推导式
#record=[format_csv_item(item.get(k) or "") for k in head_keys]
ls.append(",".join(record))
return "\n".join(ls)
def save_file(f_name,txt):
f=open(f_name,'w')
f.write(txt)
f.close()
txt=read_file("./build_log.txt")
res=convert_txt_to_dic(txt)
csv=convert_dic_to_csv(res,"time,version,server,preview,platform,channel")
save_file('build_log.csv',csv)
科普:
CSV
逗号分隔值(Comma-Separated Values,CSV,有时也称为字符分隔值,因为分隔字符也可以不是逗号),其文件以纯文本形式存储表格数据(数字和文本)。纯文本意味着该文件是一个字符序列,不含必须像二进制数字那样被解读的数据。CSV文件由任意数目的记录组成,记录间以某种换行符分隔;每条记录由字段组成,字段间的分隔符是其它字符或字符串,最常见的是逗号或制表符。通常,所有记录都有完全相同的字段序列。
规则
1 开头是不留空,以行为单位。
2 可含或不含列名,含列名则居文件第一行。
3 一行数据不跨行,无空行。
4 以 半角逗号(即,)作分隔符,列为空也要表达其存在。
5列内容如存在半角引号(即"),替换成半角双引号("")转义,即用半角引号(即"")将该字段值包含起来。
6文件读写时引号,逗号操作规则互逆。
7内码格式不限,可为 ASCII、Unicode 或者其他。
8不支持特殊字符
举例说明
年 | 制造商 | 型号 | 说明 | 价值 |
1997 | Ford | E350 | ac, abs, moon | 3000.00 |
1999 | Chevy | Venture "Extended Edition" | | 4900.00 |
1999 | Chevy | Venture "Extended Edition, Very Large" | | 5000.00 |
1996 | Jeep | Grand Cherokee | MUST SELL! | 4799.00 |
上面表格内容若以CSV格式表示就会像下列:
年,制造商,型号,说明,价值
1997,Ford,E350,"ac, abs, moon",3000.00
1999,Chevy,"Venture ""Extended Edition""","",4900.00
1999,Chevy,"Venture ""Extended Edition, Very Large""","",5000.00
1996,Jeep,Grand Cherokee,"MUST SELL!
air, moon roof, loaded",4799.00
以上这个CSV的例子说明了:
- 包含逗号, 双引号, 或是换行符的字段必须放在引号内.
- 字段内部的引号必须在其前面增加一个引号来实现文字引号的转码.
- 可能不会
- 元素中的换行符将被保留下来.