BeautifulSoup的初使用!_html

 

 简单使用:

BeautifulSoup的初使用!_python_02

python小例子链接:

 ​​https://python123.io/ws/demo.html​

代码:

import requests
from bs4 import BeautifulSoup
r = requests.get("https://python123.io/ws/demo.html")
print(r.text)
demo = r.text
soup = BeautifulSoup(demo, "html.parser")
print(soup)
print(soup.prettify())

结果:

D:\python_install\python.exe D:/pycharmworkspace/temp1/crawler_1.py
<html><head><title>This is a python demo page</title></head>
<body>
<p class="title"><b>The demo python introduces several python courses.</b></p>
<p class="course">Python is a wonderful general-purpose programming language. You can learn Python from novice to professional by tracking the following courses:
<a href="http://www.icourse163.org/course/BIT-268001" class="py1" id="link1">Basic Python</a> and <a href="http://www.icourse163.org/course/BIT-1001870001" class="py2" id="link2">Advanced Python</a>.</p>
</body></html>
<html><head><title>This is a python demo page</title></head>
<body>
<p class="title"><b>The demo python introduces several python courses.</b></p>
<p class="course">Python is a wonderful general-purpose programming language. You can learn Python from novice to professional by tracking the following courses:
<a class="py1" href="http://www.icourse163.org/course/BIT-268001" id="link1">Basic Python</a> and <a class="py2" href="http://www.icourse163.org/course/BIT-1001870001" id="link2">Advanced Python</a>.</p>
</body></html>
<html>
<head>
<title>
This is a python demo page
</title>
</head>
<body>
<p class="title">
<b>
The demo python introduces several python courses.
</b>
</p>
<p class="course">
Python is a wonderful general-purpose programming language. You can learn Python from novice to professional by tracking the following courses:
<a class="py1" href="http://www.icourse163.org/course/BIT-268001" id="link1">
Basic Python
</a>
and
<a class="py2" href="http://www.icourse163.org/course/BIT-1001870001" id="link2">
Advanced Python
</a>
.
</p>
</body>
</html>

Process finished with exit code 0

查看tag爸爸以及爷爷的标签名字:

import requests
from bs4 import BeautifulSoup
r = requests.get("https://python123.io/ws/demo.html")
print("\n")
demo = r.text
soup = BeautifulSoup(demo, "html.parser")
tag_a = soup.a
print(soup.a.parent.name)#查看其父亲的名字!
print("\n")
print(soup.a.parent.parent.name)#查看其父亲的父亲的名字!

结果:

D:\python_install\python.exe D:/pycharmworkspace/temp1/crawler_1.py


p


body

Process finished with exit code 0

转换为字典之后,获取对应的值:

代码:

import requests
from bs4 import BeautifulSoup
r = requests.get("https://python123.io/ws/demo.html")
print("\n")
demo = r.text
soup = BeautifulSoup(demo, "html.parser")
print(soup.a)#soup.tag tag就是你想要查看的标签类型!仅仅显示带有<a></a>标签的信息!
tag_a = soup.a
print("\n")
print(tag_a.attrs)#attrs:属性的意思
print("\n")
print(tag_a.attrs['id'])#获取href对应的值。
print("\n")
print(tag_a.attrs['href'])#获取href对应的值。
print("\n")

结果:

D:\python_install\python.exe D:/pycharmworkspace/temp1/crawler_1.py


<a class="py1" href="http://www.icourse163.org/course/BIT-268001" id="link1">Basic Python</a>


{'href': 'http://www.icourse163.org/course/BIT-268001', 'class': ['py1'], 'id': 'link1'}


link1


http://www.icourse163.org/course/BIT-268001



Process finished with exit code 0

HTML查看除网页标签之外字符串的方法:

代码:

import requests
from bs4 import BeautifulSoup
r = requests.get("https://python123.io/ws/demo.html")
print("\n")
demo = r.text
soup = BeautifulSoup(demo, "html.parser")
print(soup.a)#soup.tag tag就是你想要查看的标签类型!仅仅显示带有<a></a>标签的信息!
tag_a = soup.a
print("\n")
print(soup.a.string)
print("\n")
print(soup.p)
print("\n")
print(soup.p.string)

结果:

D:\python_install\python.exe D:/pycharmworkspace/temp1/crawler_1.py


<a class="py1" href="http://www.icourse163.org/course/BIT-268001" id="link1">Basic Python</a>


Basic Python


<p class="title"><b>The demo python introduces several python courses.</b></p>


The demo python introduces several python courses.

Process finished with exit code 0