Day 18 下载数据 及 Web API
- python常用模块小结
- CSV数据文件访问分析
- 使用CSV
import csv
filename = 'sitka_weather_07-2014.csv'
with open(filename) as f:
reaer = csv.reader(f)
header_row = next(reader)
- enumerate()函数:enumerate() 函数用于将一个可遍历的数据对象(如列表、元组或字符串)组合为一个索引序列,同时列出数据和数据下标,一般用在 for 循环当中。
enumerate(sequence, [start=0])
- Sample:
with open(filename) as f:
reader = csv.reader(f)
header_row = next(reader)
for index, column_header in enumerate(header_row):
print (index, column_header)
- 遍历csv文件并提取数据:for + append
with open(filename) as f:
reader = csv.reader(f)
header_row = next(reader)
dates, highs, lows = [], [], []
for row in reader:
current_date = datetime.strptime(row[0], "%Y-%m-%d")
high = int(row[1])
low = int(row[3])
dates.append(current_date)
highs .append(high)
lows.append(low)
- 错误处理
with open(filename) as f:
reader = csv.reader(f)
header_row = next(reader)
dates, highs, lows = [], [], []
for row in reader:
try:
current_date = datetime.strptime(row[0], "%Y-%m-%d")
high = int(row[1])
low = int(row[3])
except ValueError:
print (current_date, 'missing data')
else:
dates.append(current_date)
highs .append(high)
lows.append(low)
- JSON格式
- pygal.i18n 不存在,No module named 'pygal.i18n’错误:
- 改用pygal_maps_world.i18n:
- OS X
$ pip install pygal_maps_world
- Windows
\> python -m pip install pygal_maps_world
- 将’ from pygal.i18n import COUNTRIES '改为
from pygal_maps_world.i18n import COUNTRIES ```
- module ‘pygal’ has no attribute ‘Worldmap’ 错误
- 改用‘pygal_maps_world’
import pygal_maps_world.maps
wm = pygal_maps_world.maps.World()
- Web API
- Web API用于与网站进行交互,请求数据(以JSON或CSV返回)。
- requests包,让python能向网站请求信息以及检查返回的响应。
- 安装requests包
- OS X
$ pip install --user requests
- Windows
$ python -m pip install --user requests
- 处理并响应字典
import requests
#执行API调用并存储响应
url = "https://api.github.com/search/repositories?q=language:python&sort=stars"
r = requests.get(url)
print ("Status code: ", r.status_code)
#将API响应存储在一个字典变量中
response_dict = r.json()
print ("Total repositories: ", response_dict['total_count'])
#探索有关仓库的信息
repo_dicts = response_dict['items']
print ("Repositories returned: " , len(repo_dicts))
#研究第一个仓库
repo_dict = repo_dicts[0]
print ("\nKeys:", len(repo_dict))
for key in repo_dict.keys():
print (key)
- 进一步研究‘仓库’
#研究第一个仓库
for repo_dict in repo_dicts:
print ("\nSelcted information about first repository: ")
print ('Name: ' + repo_dict['name'])
print ('Owner: ' , repo_dict['owner']['login'])
print ('Start: ' , repo_dict['stargazers_count'])
print ('Repository: ', repo_dict['html_url'])
print ('Created: ', repo_dict['created_at'])
print ('Updated: ', repo_dict['updated_at'])
print ('Description: ', repo_dict['description'])
- ‘NoneType’ object has no attribute ‘decode’ 错误:运行下面的代码时出现上述错误:
names, plot_dicts = [], []
for repo_dict in repo_dicts:
names.append(repo_dict['name'])
plot_dict = {
'value': repo_dict['stargazers_count'],
'label': repo_dict['description'] ,
}
plot_dicts.append(plot_dict)
#可视化
my_style = LS('#333366', base_style = LCS)
my_config = pygal.Config()
my_config.x_label_rotation = 45
my_config.show_legend = False
my_config.title_font_size = 24
my_config.label_font_size = 14
my_config.major_label_font_size = 18
my_config.truncate_label = 15
my_config_show_y_guides = False
my_config.width = 1000
chart = pygal.Bar(my_config, style = my_style)
chart.title = 'Most-starred Python Projects on GitHub'
chart.x_labels = names
chart.add('', plot_dicts)
chart.render_to_file('python_repos.svg')
参考下面两种解决办法:
第一种方法,即:
'label': str(repo_dict['description']),
改为:
'label': str(repo_dict['description']),
既简单又方便。
- Hacker News API,学习以下三个知识点:
- 根据Web API调用返回的列表,动态生成WEB API调用网址,并再次调用WEB API访问并获取数据;
- 字典的dict.get()函数,不确定某个键是否包含在字典中时,可使用方法dict.get(),它在指定的键存在时返回与之相关的值,在指定的键不存在时返回第二个实参指定的值
- 模块operator中的函数item getter(),以及与sorted()函数的配合使用。这个函数传递键’comments’,它将从这个列表中的每个字典中提取与键’comments’相关的值,函数sorted()将根据这种值对列表进行排序
import requests
from operator import itemgetter
#执行API调用并存储响应
url = 'https://hacker-news.firebaseio.com/v0/topstories.json'
r = requests.get(url)
print ('Status code: ', r.status_code)
#处理有关每篇文章的信息
submission_ids = r.json()
#创建submission_dicts空列表,用于存储热门文章字典
submission_dicts = []
#取前30个热门文章ID
for submission_id in submission_ids[:30]:
#对于每篇文章,都执行一个API调用
#根据存储在submission_ids列表中的ID生成URL
url = ('https://hacker-news.firebaseio.com/v0/item/' +
str(submission_id) + '.json')
submission_r = requests.get(url)
print(submission_r.status_code)
response_dict = submission_r.json()
#为当前处理的文章生成一个字典
submission_dict = {
'title': response_dict['title'],
'link': 'http://news.ycombinator.com/item?id=' + str(submission_id),
'comments': response_dict.get('descendants', 0)
}
submission_dicts.append(submission_dict)
submission_dicts = sorted(submission_dicts, key =
itemgetter('comments'),reverse = True)
for submission_dict in submission_dicts:
print ('\nTitle: ', submission_dict['title'])
print ('Discussion link: ', submission_dict['link'])
print ('Comments: ', submission_dict['comments'])
上面这段代码返回的数据结果:
[{"title": "Glitter bomb tricks parcel thieves",
"link": "http://news.ycombinator.com/item?id=18706193",
"comments": 304},
{"title": "Stop Learning Frameworks",
"link": "http://news.ycombinator.com/item?id=18706785",
"comments": 175},
{"title": "Reasons Python Sucks",
"link": "http://news.ycombinator.com/item?id=18706174",
"comments": 175},
{"title": "I need to copy 2000+ DVDs in 3 days. What are my options?",
"link": "http://news.ycombinator.com/item?id=18690587",
"comments": 167},
{"title": "SpaceX Is Raising $500M at a $30.5B Valuation",
"link": "http://news.ycombinator.com/item?id=18706506",
"comments": 139},
.........
]