python 下载多sheet excel

转载

mob64ca1410eb61 2024-12-02 14:47:15

Day 18 下载数据及 Web API

python常用模块小结

python 下载多sheet excel_数据

CSV数据文件访问分析

使用CSV

import csv

filename = 'sitka_weather_07-2014.csv'
with open(filename) as f:
	reaer = csv.reader(f)
	header_row = next(reader)

enumerate()函数：enumerate() 函数用于将一个可遍历的数据对象(如列表、元组或字符串)组合为一个索引序列，同时列出数据和数据下标，一般用在 for 循环当中。

enumerate(sequence, [start=0])

Sample:

with open(filename) as f:
		reader = csv.reader(f)
		header_row = next(reader)
		for index, column_header in enumerate(header_row):
			print (index, column_header)

遍历csv文件并提取数据：for + append

with open(filename) as f:
reader = csv.reader(f)
header_row = next(reader)

dates, highs, lows = [], [], []
for row in reader:
	current_date = datetime.strptime(row[0], "%Y-%m-%d")
	high = int(row[1])
	low = int(row[3])
	dates.append(current_date)
	highs .append(high)
	lows.append(low)

错误处理

with open(filename) as f:
reader = csv.reader(f)
header_row = next(reader)

dates, highs, lows = [], [], []
for row in reader:
	try:
		current_date = datetime.strptime(row[0], "%Y-%m-%d")
		high = int(row[1])
		low = int(row[3])
	except ValueError:
		print (current_date, 'missing data')
	else:
		dates.append(current_date)
		highs .append(high)
		lows.append(low)

JSON格式

pygal.i18n 不存在，No module named 'pygal.i18n’错误：

改用pygal_maps_world.i18n：

OS X

$ pip install pygal_maps_world

Windows

\> python -m pip install pygal_maps_world

将’ from pygal.i18n import COUNTRIES '改为

from pygal_maps_world.i18n import COUNTRIES		```

module ‘pygal’ has no attribute ‘Worldmap’ 错误

改用‘pygal_maps_world’

import pygal_maps_world.maps

wm = pygal_maps_world.maps.World()

Web API

Web API用于与网站进行交互，请求数据（以JSON或CSV返回）。
requests包，让python能向网站请求信息以及检查返回的响应。

安装requests包

OS X

$ pip install --user requests

- Windows

$ python -m pip install --user requests

处理并响应字典

import requests
	
	#执行API调用并存储响应
	url = "https://api.github.com/search/repositories?q=language:python&sort=stars"
	r = requests.get(url)
	print ("Status code: ", r.status_code)
	
	#将API响应存储在一个字典变量中
	response_dict = r.json()
	print ("Total repositories: ", response_dict['total_count'])
	
	#探索有关仓库的信息
	repo_dicts = response_dict['items']
	print ("Repositories returned: " , len(repo_dicts))
	
	#研究第一个仓库
	repo_dict = repo_dicts[0]
	print ("\nKeys:", len(repo_dict))
	for key in repo_dict.keys():
		print (key)

进一步研究‘仓库’

#研究第一个仓库
	for repo_dict in repo_dicts:
		print ("\nSelcted information about first repository: ")
		print ('Name: ' + repo_dict['name'])
		print ('Owner: ' , repo_dict['owner']['login'])
		print ('Start: ' , repo_dict['stargazers_count'])
		print ('Repository: ', repo_dict['html_url'])
		print ('Created: ', repo_dict['created_at'])
		print ('Updated: ', repo_dict['updated_at'])
		print ('Description: ', repo_dict['description'])

‘NoneType’ object has no attribute ‘decode’ 错误：运行下面的代码时出现上述错误：

names, plot_dicts = [], []
	for repo_dict in repo_dicts:
		names.append(repo_dict['name'])
		plot_dict = {
			'value': repo_dict['stargazers_count'],
			'label': repo_dict['description'] ,
			}
		plot_dicts.append(plot_dict)
		
	#可视化
	my_style = LS('#333366', base_style = LCS)
	
	my_config = pygal.Config()
	my_config.x_label_rotation = 45
	my_config.show_legend = False
	my_config.title_font_size = 24
	my_config.label_font_size = 14
	my_config.major_label_font_size = 18
	my_config.truncate_label = 15
	my_config_show_y_guides = False
	my_config.width = 1000
	
	chart = pygal.Bar(my_config, style = my_style)
	chart.title = 'Most-starred Python Projects on GitHub'
	chart.x_labels = names
	
	chart.add('', plot_dicts)
	chart.render_to_file('python_repos.svg')

参考下面两种解决办法：

第一种方法，即：

'label': str(repo_dict['description']),

改为：

'label': str(repo_dict['description']),

既简单又方便。

Hacker News API，学习以下三个知识点：

根据Web API调用返回的列表，动态生成WEB API调用网址，并再次调用WEB API访问并获取数据；
字典的dict.get()函数，不确定某个键是否包含在字典中时，可使用方法dict.get()，它在指定的键存在时返回与之相关的值，在指定的键不存在时返回第二个实参指定的值
模块operator中的函数item getter()，以及与sorted()函数的配合使用。这个函数传递键’comments’，它将从这个列表中的每个字典中提取与键’comments’相关的值，函数sorted()将根据这种值对列表进行排序

import requests
from operator import itemgetter

#执行API调用并存储响应
url = 'https://hacker-news.firebaseio.com/v0/topstories.json'
r = requests.get(url)
print ('Status code: ', r.status_code)

#处理有关每篇文章的信息
submission_ids = r.json()
#创建submission_dicts空列表，用于存储热门文章字典
submission_dicts = []

#取前30个热门文章ID
for submission_id in submission_ids[:30]:
	#对于每篇文章，都执行一个API调用
	#根据存储在submission_ids列表中的ID生成URL
	url = ('https://hacker-news.firebaseio.com/v0/item/' + 
		str(submission_id) + '.json')
	submission_r = requests.get(url)
	print(submission_r.status_code)

	response_dict = submission_r.json()

	#为当前处理的文章生成一个字典	
	submission_dict = {
	'title': response_dict['title'],
	'link': 'http://news.ycombinator.com/item?id=' + str(submission_id),
	'comments': response_dict.get('descendants', 0)
	}
	submission_dicts.append(submission_dict)

submission_dicts = sorted(submission_dicts, key = 
	itemgetter('comments'),reverse = True)

for submission_dict in submission_dicts:
	print ('\nTitle: ', submission_dict['title'])
	print ('Discussion link: ', submission_dict['link'])
	print ('Comments: ', submission_dict['comments'])

上面这段代码返回的数据结果：

[{"title": "Glitter bomb tricks parcel thieves", 
"link": "http://news.ycombinator.com/item?id=18706193", 
"comments": 304}, 
{"title": "Stop Learning Frameworks", 
"link": "http://news.ycombinator.com/item?id=18706785", 
"comments": 175}, 
{"title": "Reasons Python Sucks", 
"link": "http://news.ycombinator.com/item?id=18706174", 
"comments": 175}, 
{"title": "I need to copy 2000+ DVDs in 3 days. What are my options?", 
"link": "http://news.ycombinator.com/item?id=18690587", 
"comments": 167}, 
{"title": "SpaceX Is Raising $500M at a $30.5B Valuation", 
"link": "http://news.ycombinator.com/item?id=18706506", 
"comments": 139}, 
.........
]

本文章为转载内容，我们尊重原作者对文章享有的著作权。如有内容错误或侵权问题，欢迎原作者联系我们进行内容更正或删除文章。