python做cs架构的系统

转载

码海探险家 2024-10-14 18:31:34

文章标签 python做cs架构的系统字符串 TCP 服务器 文章分类 Python 后端开发

一次完整的HTTP事务的过程
域名解析–>发起TCP的3次握手–>建立TCP连接后发起http请求–>服务器响应http请求，浏览器得到html代码–>浏览器解析html代码，并请求html代码中的资源(js，css，images)–>浏览器对页面进行渲染呈现给用户
DNS: 域名系统，将域名和IP地址相互映射的一个分布式数据库，DNS使用TCP和UDP端口53。
TCP的三次握手: 拿到域名对应的ip地址之后，User-Agent(一般指浏览器)会以一个随机端口(1024<端口<65535)向服务器的web程序80端口发起TCP的连接请求。两个计算机通信是靠协议(目前流行的TCP/IP)来实现，三次握手就相当于试探对方是否遵循TCP/IP协议。
URL的概念以及组成
URL:统一资源定位符，由三部分组成:

协议(http.https,ftp)
主机IP地址(包括端口号)
资源地址

urllib，urllib2

import urllib
import urllib2

# response = urllib2.urlopen("http://www.baidu.com")


# response = urllib2.urlopen(url,data,timeout)

values = {"username":"1016903103@qq.com","password":"xxxxxxx"}

# 将key-value键值对转化成符合要求的字符串

data = urllib.urlencode(values)
url = ""

# 构造request


# post方式

request = urllib2.Request(url,data)

# get方式


# geturl = url+"?"+data


# request = urllib2.Request(geturl)


# 构造response

response = urllib2.urlopen(request)
print response.read()

Headers

import  urllib
import  urllib2


values = {"username":"1016903103@qq.com","password":"xxxxxxx"}

# 将key-value键值对转化成符合要求的字符串

data = urllib.urlencode(values)
url = ""

# User_Agent: 请求的身份(浏览器)


# Referer: 对付“反盗链”

headers = { 
'User-Agent' : 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)' ,
'Referer':'http://www.zhihu.com/articles' }

# post方式

request = urllib2.Request(url,data,headers)
response = urllib2.urlopen(request)
print response.read()

Proxy(代理)

import  urllib
 import  urllib2


 # Proxy（代理）的设置
enable_proxy = True
proxy_handler = urllib2.ProxyHandler({"http" : 'http://some-proxy.com:8080'})
null_proxy_handler = urllib2.ProxyHandler({})
if enable_proxy:
    opener = urllib2.build_opener(proxy_handler)
else:
    opener = urllib2.build_opener(null_proxy_handler)
urllib2.install_opener(opener)

URLError异常

URLError

网络无连接
连接不到特定的服务器
服务器不存在

HTTPError

HTTRError是URLError的子类，在利用urlopen()发出请求时，服务器都会response，其中包括状态码。

#URLError异常处理

request = urllib2.Request('')
try:
    urllib2.urlopen(request)
except urllib2.URLError,e:
    if hasattr(e,"code"):
        print e.code
    if hasattr(e,"reason"):
        print e.reason
else:
    print "OK"

Cookie
cookie: 指某些网站为了辨别用户身份，进行session跟踪而储存在用户本地终端上的数据(通过经过加密)。

opener
cookielib: 提供可存储的cookie对象，以便于与urllib2模块配合使用来访问internet资源。利用本模块的CookieJar类的对象来捕获cookie并在后续连接请求时重新发送，比如模拟登陆。改模块主要的对象:
CookieJar–派生–FileCookieJar–派生–MozillaCookieJar和LWPCookieJar

首先利用CookieJar对象获取cookie保存到变量
使用FileCookieJar的子类MozillaCookieJar来实现Cookie保存到变量的过程

#coding=utf-8


import  urllib2
import  cookielib


# 设置保存cookie的文件，同级目录下的cookie.txt

filename = 'cookie.txt'

# 声明一个MozillaCookieJar对象实例来保存cookie，之后写入文件

cookie = cookielib.MozillaCookieJar(filename)

# 声明一个CookJar对象实例来保存cookie


#cookie = cookielib.CookieJar()


# 利用urllib2库的HTTPCookieProcessor对象来创建cookie处理器

handler = urllib2.HTTPCookieProcessor(cookie)

# 通过handler来构建opener

opener = urllib2.build_opener(handler)
response = opener.open('http://www.baidu.com')

# 保存cookie到文件


# ignore_discard: 即使cookie将被丢弃也要保存；


# ignore_expires: 如果该文件cookie已经存在，则覆盖原文件写入

cookie.save(ignore_discard = True,ignore_expires = True)

从文件中获取并使用Cookie

#coding=utf-8


import  urllib2
import  cookielib


# 创建MozillaCookJar实例对象

cookie = cookielib.MozillaCookieJar()

# 从文件中读取Cookie内容到变量

cookie.load('cookie.txt',ignore_expires=True,ignore_discard=True)

# 创建请求的request

req = urllib2.Request('http://www.baidu.com')

# 利用urllib2的build_opener方法创建一个opener

opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cookie))
rsp = opener.open(req)
print rsp.read()

利用Cookie模拟网站登录

#coding=utf-8


import  urllib
import  urllib2
import  cookielib

filename = 'cookie.txt'

# 声明一个MozillaCookieJar对象实例来保存cookie，之后写入文件

cook = cookielib.MozillaCookieJar(filename)
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cook))
postdata = urllib.urlencode({
    'stuid':'2011',
    'pwd':'1234'
    })
url = 'http://jwxt.sdu.edu.cn:7890/pls/wwwbks/bks_login2.login'

# 模拟登陆，并把cookie保存到变量

result = opener.open(url,postdata)
cook.save(ignore_discard=True,ignore_expires=True)

# 读取写入的cookie访问其他网址

url2 = 'http://jwxt.sdu.edu.cn:7890/pls/wwwbks/bkscjcx.curscopre'
result = opener.open(url2)
print  result.read()

正则表达式
正则表达式快速入门

re.match(pattern,string[,flags])
这个方法将会从string的开头开始，尝试匹配pattern，一直向后匹配，如果遇到无法匹配的字符，立即返回NULL,如果匹配未结束已经达到string的末尾，也会返回NULL。
常用的方法:

group([group1,…]): 获得一个或多个分组解截获的字符串，编号0代表整个字符串。
start([group]): 返回指定的组截获的字符串的起始索引
end([group]): 返回指定的组截获的字符串的结束索引
span([group]): 返回(start(group),end(group))

re.search(pattern,string[,falgs])
search与match方法极其类似，区别在于match函数检测re是不是在string的开始位置匹配，search会扫描整个string查找匹配。

#coding=utf-8


import  re


# 将正则表达式编译成Pattern对象

pattern = re.compile(r'world')

# 使用search()查找匹配的字符串，匹配

match = re.search(pattern,'hello world!')

# 使用match()查找匹配的字符串，不匹配返回None


#match = re.search(pattern,'hello world!')

if match:
    print  match.group()

re.split(pattern,string[,falgs]): 按照匹配的字符串将string分割后返回列表
re.findall(pattern,string[,falgs]): 以列表形式返回全部能匹配的字符串
re.sub(pattern,string[,falgs]): 使用repl替换string中每一个匹配的字符串返回替换后的字符串

Beautiful Soup

安装
pip install beautifulsoup4
pip install virtualenv
注: Beautiful Soup支持Python标准库中的HTML解析器，还支持第三方解析器，lxml解析器更加强大，速度更快。
创建对象

#coding=utf-8


from  bs4 import BeautifulSoup
html = """
<html><head><title>The Dormouse's story</title></head>
<body>
<p class="title" name="dromouse"><b>The Dormouse's story</b></p>
<p class="story">Once upon a time there were three little sisters; and their names were
<a href="http://example.com/elsie" class="sister" id="link1"><!-- Elsie --></a>,
<a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and
<a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;
and they lived at the bottom of a well.</p>
<p class="story">...</p>
"""
soup = BeautifulSoup(html)

# prettify格式化输出对象

print soup.prettify()

四大对象

Tag
<title> The Document's story </title>
print soup.name
print soup.p.attrs
NavigableString
print soup.title.string # 获得标签内部的文字
BeautifulSoup
BeautifulSoup对象表示的是文档的全部内容。
Comment
特殊类型的NavigableString对象，输出内容仍然不包括注释符号。

遍历文档树

contents
tag的.contents属性可以将tag的子节点以列表的方式输出。
print soup.head.contents[0]
children

for child in soup.body.children:
    print child

.parent(s)
.next_sibling(s).previous_sibling(s)
.next_element(s).previous_element(s)

搜索文档树

find_all(name,attrs,recursive,text,**kwargs)

CSS选择器

通过标签名查找
通过类名查找
通过ID查找

注: BeautifulSoup知识点太多，后续实战再慢慢学习巩固。

本文章为转载内容，我们尊重原作者对文章享有的著作权。如有内容错误或侵权问题，欢迎原作者联系我们进行内容更正或删除文章。

上一篇：NavDestination 自定义导航栏

下一篇：OpenCV 截取roi

提问和评论都可以，用心的回复会被更多人看到评论

发布评论

相关文章

官方博客	全部文章	热门标签	班级博客
了解我们	网站地图	意见反馈

鸿蒙开发者社区	51CTO学堂
51CTO	软考资讯

python做cs架构的系统

python做cs架构的系统

51CTO博客