利用python爬取github

原创

mob64ca12d97dad 2024-04-07 03:27:37 ©著作权

文章标签 Python python 代码示例 文章分类 Python 后端开发

©著作权归作者所有：来自51CTO博客作者mob64ca12d97dad的原创作品，请联系作者获取转载授权，否则将追究法律责任

爬取GitHub仓库信息的方法

在开发和学习中，我们经常需要获取GitHub上的仓库信息。这时候如果手动一个个去查找并记录信息就显得非常繁琐。利用Python编程语言，我们可以很方便地编写一个爬虫程序，自动获取GitHub上的仓库信息。接下来，我们将介绍如何利用Python爬取GitHub仓库信息的方法。

准备工作

在开始之前，我们需要确保已经安装好Python编程环境和相关的第三方库。其中，我们将使用requests和BeautifulSoup这两个库来实现爬取GitHub仓库信息的功能。如果你还没有安装这两个库，可以通过以下命令来安装：

pip install requests
pip install beautifulsoup4

编写爬虫程序

首先，我们需要导入所需的库：

import requests
from bs4 import BeautifulSoup

接下来，我们定义一个函数来获取GitHub上某个用户的所有仓库信息：

def get_repos(username):
    url = f'
    response = requests.get(url)
    
    if response.status_code == 200:
        soup = BeautifulSoup(response.text, 'html.parser')
        repos = soup.find_all('h3', class_='wb-break-all')
        for repo in repos:
            print(repo.text)
    else:
        print('Failed to get repositories.')

在这段代码中，我们首先构建了一个GitHub用户的仓库页面的URL，然后使用requests库发送HTTP请求获取页面内容。接着，我们使用BeautifulSoup库解析页面内容，找到所有仓库的标题并打印出来。

调用爬虫程序

现在，我们可以调用get_repos函数来获取指定GitHub用户的仓库信息了。比如，我们要获取GitHub用户octocat的仓库信息，可以这样调用：

get_repos('octocat')

完整代码示例

import requests
from bs4 import BeautifulSoup

def get_repos(username):
    url = f'
    response = requests.get(url)
    
    if response.status_code == 200:
        soup = BeautifulSoup(response.text, 'html.parser')
        repos = soup.find_all('h3', class_='wb-break-all')
        for repo in repos:
            print(repo.text)
    else:
        print('Failed to get repositories.')

get_repos('octocat')