Day 18 下载数据 及 Web API

  • python常用模块小结
  • python 下载多sheet excel_数据

  • CSV数据文件访问分析
  • 使用CSV
import csv

filename = 'sitka_weather_07-2014.csv'
with open(filename) as f:
	reaer = csv.reader(f)
	header_row = next(reader)
  • enumerate()函数:enumerate() 函数用于将一个可遍历的数据对象(如列表、元组或字符串)组合为一个索引序列,同时列出数据和数据下标,一般用在 for 循环当中。
enumerate(sequence, [start=0])
  • Sample:
with open(filename) as f:
		reader = csv.reader(f)
		header_row = next(reader)
		for index, column_header in enumerate(header_row):
			print (index, column_header)
  • 遍历csv文件并提取数据:for + append
with open(filename) as f:
reader = csv.reader(f)
header_row = next(reader)

dates, highs, lows = [], [], []
for row in reader:
	current_date = datetime.strptime(row[0], "%Y-%m-%d")
	high = int(row[1])
	low = int(row[3])
	dates.append(current_date)
	highs .append(high)
	lows.append(low)
  • 错误处理
with open(filename) as f:
reader = csv.reader(f)
header_row = next(reader)

dates, highs, lows = [], [], []
for row in reader:
	try:
		current_date = datetime.strptime(row[0], "%Y-%m-%d")
		high = int(row[1])
		low = int(row[3])
	except ValueError:
		print (current_date, 'missing data')
	else:
		dates.append(current_date)
		highs .append(high)
		lows.append(low)
  • JSON格式
  • pygal.i18n 不存在,No module named 'pygal.i18n’错误:
  • 改用pygal_maps_world.i18n:
  • OS X
$ pip install pygal_maps_world
  • Windows
\> python -m pip install pygal_maps_world
  • 将’ from pygal.i18n import COUNTRIES '改为
from pygal_maps_world.i18n import COUNTRIES		```
  • module ‘pygal’ has no attribute ‘Worldmap’ 错误
  • 改用‘pygal_maps_world’
import pygal_maps_world.maps

wm = pygal_maps_world.maps.World()
  • Web API
  • Web API用于与网站进行交互,请求数据(以JSON或CSV返回)。
  • requests包,让python能向网站请求信息以及检查返回的响应。
  • 安装requests包
  • OS X
$ pip install --user requests
- Windows
$ python -m pip install --user requests
  • 处理并响应字典
import requests
	
	#执行API调用并存储响应
	url = "https://api.github.com/search/repositories?q=language:python&sort=stars"
	r = requests.get(url)
	print ("Status code: ", r.status_code)
	
	#将API响应存储在一个字典变量中
	response_dict = r.json()
	print ("Total repositories: ", response_dict['total_count'])
	
	#探索有关仓库的信息
	repo_dicts = response_dict['items']
	print ("Repositories returned: " , len(repo_dicts))
	
	#研究第一个仓库
	repo_dict = repo_dicts[0]
	print ("\nKeys:", len(repo_dict))
	for key in repo_dict.keys():
		print (key)
  • 进一步研究‘仓库’
#研究第一个仓库
	for repo_dict in repo_dicts:
		print ("\nSelcted information about first repository: ")
		print ('Name: ' + repo_dict['name'])
		print ('Owner: ' , repo_dict['owner']['login'])
		print ('Start: ' , repo_dict['stargazers_count'])
		print ('Repository: ', repo_dict['html_url'])
		print ('Created: ', repo_dict['created_at'])
		print ('Updated: ', repo_dict['updated_at'])
		print ('Description: ', repo_dict['description'])
  • ‘NoneType’ object has no attribute ‘decode’ 错误:运行下面的代码时出现上述错误:
names, plot_dicts = [], []
	for repo_dict in repo_dicts:
		names.append(repo_dict['name'])
		plot_dict = {
			'value': repo_dict['stargazers_count'],
			'label': repo_dict['description'] ,
			}
		plot_dicts.append(plot_dict)
		
	#可视化
	my_style = LS('#333366', base_style = LCS)
	
	my_config = pygal.Config()
	my_config.x_label_rotation = 45
	my_config.show_legend = False
	my_config.title_font_size = 24
	my_config.label_font_size = 14
	my_config.major_label_font_size = 18
	my_config.truncate_label = 15
	my_config_show_y_guides = False
	my_config.width = 1000
	
	chart = pygal.Bar(my_config, style = my_style)
	chart.title = 'Most-starred Python Projects on GitHub'
	chart.x_labels = names
	
	chart.add('', plot_dicts)
	chart.render_to_file('python_repos.svg')

参考下面两种解决办法:

第一种方法,即:

'label': str(repo_dict['description']),

改为:

'label': str(repo_dict['description']),

既简单又方便。

  • Hacker News API,学习以下三个知识点:
  • 根据Web API调用返回的列表,动态生成WEB API调用网址,并再次调用WEB API访问并获取数据;
  • 字典的dict.get()函数,不确定某个键是否包含在字典中时,可使用方法dict.get(),它在指定的键存在时返回与之相关的值,在指定的键不存在时返回第二个实参指定的值
  • 模块operator中的函数item getter(),以及与sorted()函数的配合使用。这个函数传递键’comments’,它将从这个列表中的每个字典中提取与键’comments’相关的值,函数sorted()将根据这种值对列表进行排序
import requests
from operator import itemgetter

#执行API调用并存储响应
url = 'https://hacker-news.firebaseio.com/v0/topstories.json'
r = requests.get(url)
print ('Status code: ', r.status_code)

#处理有关每篇文章的信息
submission_ids = r.json()
#创建submission_dicts空列表,用于存储热门文章字典
submission_dicts = []

#取前30个热门文章ID
for submission_id in submission_ids[:30]:
	#对于每篇文章,都执行一个API调用
	#根据存储在submission_ids列表中的ID生成URL
	url = ('https://hacker-news.firebaseio.com/v0/item/' + 
		str(submission_id) + '.json')
	submission_r = requests.get(url)
	print(submission_r.status_code)

	response_dict = submission_r.json()

	#为当前处理的文章生成一个字典	
	submission_dict = {
	'title': response_dict['title'],
	'link': 'http://news.ycombinator.com/item?id=' + str(submission_id),
	'comments': response_dict.get('descendants', 0)
	}
	submission_dicts.append(submission_dict)

submission_dicts = sorted(submission_dicts, key = 
	itemgetter('comments'),reverse = True)

for submission_dict in submission_dicts:
	print ('\nTitle: ', submission_dict['title'])
	print ('Discussion link: ', submission_dict['link'])
	print ('Comments: ', submission_dict['comments'])

上面这段代码返回的数据结果:

[{"title": "Glitter bomb tricks parcel thieves", 
"link": "http://news.ycombinator.com/item?id=18706193", 
"comments": 304}, 
{"title": "Stop Learning Frameworks", 
"link": "http://news.ycombinator.com/item?id=18706785", 
"comments": 175}, 
{"title": "Reasons Python Sucks", 
"link": "http://news.ycombinator.com/item?id=18706174", 
"comments": 175}, 
{"title": "I need to copy 2000+ DVDs in 3 days. What are my options?", 
"link": "http://news.ycombinator.com/item?id=18690587", 
"comments": 167}, 
{"title": "SpaceX Is Raising $500M at a $30.5B Valuation", 
"link": "http://news.ycombinator.com/item?id=18706506", 
"comments": 139}, 
.........
]