python re模块搜索所有匹配位置

转载

小鱼儿 2024-07-21 13:29:03

文章标签 python 开发语言后端字符串正则表达式 文章分类 Python 后端开发

1.search() vs. match()

Python 提供了两种不同的操作：基于 re.match() 检查字符串开头，或者 re.search() 检查字符串的任意位置（默认Perl中的行为）

例如：

>>> re.match("c", "abcdef")    # No match
>>> re.search("c", "abcdef")   # Match
<re.Match object; span=(2, 3), match='c'>

2.search（pattern,s）

import re

pattern = 'this'
text = 'Does this text match the pattern?'

match = re.search(pattern, text)

s = match.start()   # 5
e = match.end()     # 9

print ('Found "%s"\nin "%s"\nfrom %d to %d {"%s"}' % \
        (match.re.pattern, match.string, s, e, text[s:e]))

python re模块搜索所有匹配位置_开发语言

实验分析：

通过实验结果我们可以观察到，我们在原文本中找到了像查找的单词，并且能够获得该单词在原文本中开始和结束的位置。

3.re.compile()+re.search()

import re
phoneNumRegex = re.compile(r'\d\d\d-\d\d\d-\d\d\d\d')
mo = phoneNumRegex.search('My number is 415-555-4242')
print('Phone number is '+mo.group())

python re模块搜索所有匹配位置_开发语言_02

python re模块搜索所有匹配位置_字符串_03

python re模块搜索所有匹配位置_正则表达式_04

实验结果分析：

通过实验结果分析我们可以看到，我们可以利用search匹配文本中出现的正则表达式，也可以利用小括号将正则表达式分组，利用groups接受所有分组值，在编写正则表达式的时候要注意转义字符。

4.用管道匹配多个分组：字符“|”成为管道。当我们想匹配众多表达式中的一个的时候，就可以使用他。如果期待的表达式都出现了，那么则返回第一个匹配到的表达式。

import re
batRegex = re.compile(r'Bat(man|mobile|copter|bat)')
mo = batRegex.search('Batmobile lost Batman')
print(mo.group())
print(mo.group(1))

python re模块搜索所有匹配位置_字符串_05

5.用星号匹配零次或多次，用加号匹配一次或多次，用花括号匹配特定次数：“*”意味着匹配零次或多次，即星号之前的分组可以在文本中出现任意次。它可以完全不存在，也可以一次又一次重复。

import re
batRegex = re.compile(r'Bat(wo)*man')
mo1 = batRegex.search('The Adventures of Batman')
print(mo1.group())
mo2 = batRegex.search('The Adventures of Batwowoman')
print(mo2.group())

python re模块搜索所有匹配位置_开发语言_06

“+”和“*”不同的点在于“+”至少出现一次才能匹配成功。

import re
batRegex = re.compile(r'Bat(wo)+man')
mo1 = batRegex.search('The Adventures of Batman')
print(mo1.group())
mo2 = batRegex.search('The Adventures of Batwowoman')
print(mo2.group())


#######
import re
batRegex = re.compile(r'Bat(wo)+man')
mo1 = batRegex.search('The Adventures of Batwoman')
print(mo1.group())
mo2 = batRegex.search('The Adventures of Batwowoman')
print(mo2.group())

python re模块搜索所有匹配位置_开发语言_07

python re模块搜索所有匹配位置_字符串_08

如果想要一个分组重复匹配特的那个次数，就在正则表达式中该分组的后面跟上花括号包围的数字。例如：（Ha）{3}将匹配‘HaHaHa’而不会匹配“HaHa”，（Ha）{3，5}可以匹配‘HaHaHa’、‘HaHaHaHa’、‘HaHaHaHaHa’，python的默认表达式是贪心的，即默认选择匹配的最长字符串。再匹配的字符串后加上”？“可转化为非贪心，但是需注意，问号在正则表达式中由两种含义：声明非贪心匹配或表示可选分组。

python re模块搜索所有匹配位置_正则表达式_09

python re模块搜索所有匹配位置_字符串_10

python re模块搜索所有匹配位置_字符串_11

6.findall()方法 findall（）方法可以返回所有匹配上的文本。如果正则表达式有分组，那么该方法将返回一个字符串的元组的列表。

import re
phoneNumRegex = re.compile(r'\d\d\d-\d\d\d-\d\d\d\d')
mo = phoneNumRegex.search('Cell:415-555-9999 Work:215-555-9999')
print(mo.group())
print(phoneNumRegex.findall('Cell:415-555-9999 Work:215-555-9999'))
phoneNumRegex = re.compile(r'(\d\d\d)-(\d\d\d)-(\d\d\d\d)')
print(phoneNumRegex.findall('Cell:415-555-9999 Work:215-555-9999'))

python re模块搜索所有匹配位置_开发语言_12

7.字符分类\d：0-9的阿拉伯数字\D：除了0-9以外的任何字符\w：任何字母、数字、或下划线字符（可以用来匹配单词）\W：除了字母、数字和下划线以外的字符\s：空格、制表符、换行符\S：除了空格、制表符、换行符的所有字符例如：“\d+\s\w+”表示：一个或多个数字（\d+），然后是一个空白字符（\s）、然后是一个或多个字母/数字/下划线字符（\w+）。8.通配字符：在正则表达式中，”.“被称为通配字符。他匹配换行符以外的所有字符。”.*“表示匹配所有字符串

import re
atRegex = re.compile(r'.at')
print(atRegex.findall('The cat in the hat sat on the flat mat.'))
nameRegex = re.compile(r'First name: (.*) Last name:(.*)')
mo = nameRegex.search('First name: AI  Last name:Sweigart')
print(mo.groups())
print(mo.groups(0))
print(mo.group(1))

python re模块搜索所有匹配位置_开发语言_13

”.*“默认使用贪心模式。总是匹配所有字符串。此时仍然可以使用”？“来解决这个问题

import re
nongreedyRegex = re.compile(r'<.*?>')
mo = nongreedyRegex.search('<To serve man> for dinner.')
print(mo.group())
greedyRegex = re.compile(r'<.*>')
mo = greedyRegex.search('<To serve man> for dinner.>')
print(mo.group())

python re模块搜索所有匹配位置_python_14