python 正则表达式找到多组结果

转载

云端创新者 2024-11-20 11:39:38

文章标签 python 正则表达式找到多组结果 python 字符串正则表达式 文章分类 Python 后端开发

os模块

用来处理文件和目录

列举部分：

执行系统命令
os.system(‘命令’)
获取当前工作目录
os.getcwd()
修改当前的工作目录
os.chdir(path)
更改权限
os.chmod(path, mode)
创建目录
os.mkdir(path)
以列表形式返回指定目录下的内容
os.listdir(path)
判断指定路径是否为目录
os.path.isdir(path)
判断指定路径是否为文件
os.path.isfile(path)
检查指定目录下的文件是否存在
os.path.exists(path)

>>> import os
>>> os.system('systeminfo')
主机名:           MS-NPHLBLBIQCEA
OS 名称:          Microsoft Windows 10 专业版
OS 版本:          10.0.18363 暂缺 Build 18363
...

>>> os.getcwd()
'C:\\Users\\Administrator'

>>> os.listdir('e:\python37')
['curl.exe', 'DLLs', 'Doc', 'get-pip.py', 'include', 'Lib', 'libs', 'LICENSE.txt', 'NEWS.txt', 'phantomjs.exe', 'python.exe', 'python3.dll', 'python37.dll', 'pythonw.exe', 'Scripts', 'tcl', 'Tools', 'vcruntime140.dll']

>>> os.path.isdir('e:')
True

>>> os.path.exists('e:\a.txt')
False

异常处理

Python 有两种错误很容易辨认：语法错误和异常。
运行期检测到的错误被称为异常

>>> 10 / 0
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ZeroDivisionError: division by zero
>>> name
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'name' is not defined
>>> "" + 3
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: can only concatenate str (not "int") to str

异常处理语法：

try:
	...(可能产生的代码)
except 异常:
	...(产生异常后的处理代码)


try:
	...(可能产生的代码)
except 异常:
	...(产生异常后的处理代码)
finally:
	...(一定要执行的代码)
	
	
except可以有多个，放在一个括号里成为一个元组
raise Exception 	抛出异常

正则表达式

一个特殊字符序列，检查一个字符串是否与某种模式匹配

re模块能够使用正则表达式的功能

re.match(pattern, string, flags=0) 从起始位置匹配，返回匹配对象，没有返回None
pattern 匹配的正则表达式
string 要匹配的字符串
flags 标志位，控制正则的匹配方式
re.search(pattern, string, flags=0) 扫描整个字符串并返回第一个成功的匹配
match与search区别：
re.match 只匹配字符串的开始，如果字符串开始不符合正则表达式，则匹配失败，函数返回 None
re.search 匹配整个字符串，直到找到一个匹配，没有返回None

>>> import re
>>>
>>> string = 'www.baidu.com'
>>> print(re.match('www', string))
<re.Match object; span=(0, 3), match='www'>
>>> print(re.match('baidu', string))
None

>>> print(re.search('baidu', string))
<re.Match object; span=(4, 9), match='baidu'>

检索和替换：
re.sub(pattern, repl, string, count=0, flags=0) 用于替换字符串中的匹配项
repl 替换的字符串
count 默认为0，表示替换所有匹配

>>> import re
>>>
>>> phone = "1234-5678-910 #电话号码"
>>> num = re.sub(r'#.*$', '', phone)	#  ‘#.*$’表示匹配以‘#’开头，后0或多个任意字符
>>> num
'1234-5678-910 '
>>> num = re.sub(r'\D', '', num)	# ‘\D’表示匹配非数字
>>> num
'12345678910'

re.compile(pattern[, flags]) 用于编译正则表达式，生成一个正则表达式（ Pattern ）对象
re.findall(pattern, string, flags=0) 在字符串中找到正则表达式所匹配的所有子串，并返回一个列表

>>> import re
>>>
>>> str = '12sd123r4354fg5'
>>> re.findall('[a-zA-Z]', str)
['s', 'd', 'r', 'f', 'g']
>>> re.findall('[0-9]', str)
['1', '2', '1', '2', '3', '4', '3', '5', '4', '5']

re.finditer(pattern, string, flags=0) 在字符串中找到正则表达式所匹配的所有子串，并作为一个迭代器返回

>>> import re
>>>
>>>> iter = re.finditer('[a-z]', str)
>>> iter
<callable_iterator object at 0x000001C5DD3CF908>
>>> list(iter)
[<re.Match object; span=(2, 3), match='s'>, <re.Match object; span=(3, 4), match='d'>, <re.Match object; span=(7, 8), match='r'>, <re.Match object; span=(12, 13), match='f'>, <re.Match object; span=(13, 14), match='g'>]

re.split(pattern, string[, maxsplit=0, flags=0]) 匹配子串分割返回为列表

>>> import re
>>>
>>> string = "my name is joker"
>>> re.split(r'\s', string)
['my', 'name', 'is', 'joker']

修饰符标志位：

修饰符	描述
re.I	使匹配对大小写不敏感
re.M	多行匹配，影响 ^ 和 $
…	…

>>> import re
>>>
>>> string = "my name is Joker"
>>> re.search('joker', string)
>>> print( re.search('joker', string))
None
>>> print( re.search('joker', string, re.I))
<re.Match object; span=(11, 16), match='Joker'>

正则表达式模式：

模式	描述
.	表示任意字符
^	匹配字符串开头
$	匹配字符串末尾
[…]	表示一组字符
[^…]	不在[]中的字符
*	匹配0个或多个
+	匹配1个或多个
？	匹配0个或1个
\w	匹配数字字母下划线
\W	匹配非数字字母下划线
\d	匹配数字
\D	匹配非数字