Python 提供了两种不同的操作:基于 re.match() 检查字符串开头,或者 re.search() 检查字符串的任意位置(默认Perl中的行为)
例如:
>>> re.match("c", "abcdef") # No match
>>> re.search("c", "abcdef") # Match
<re.Match object; span=(2, 3), match='c'>
import re
pattern = 'this'
text = 'Does this text match the pattern?'
match = re.search(pattern, text)
s = match.start() # 5
e = match.end() # 9
print ('Found "%s"\nin "%s"\nfrom %d to %d {"%s"}' % \
(match.re.pattern, match.string, s, e, text[s:e]))
通过实验结果我们可以观察到,我们在原文本中找到了像查找的单词,并且能够获得该单词在原文本中开始和结束的位置。
import re
phoneNumRegex = re.compile(r'\d\d\d-\d\d\d-\d\d\d\d')
mo = phoneNumRegex.search('My number is 415-555-4242')
print('Phone number is '+mo.group())
通过实验结果分析我们可以看到,我们可以利用search匹配文本中出现的正则表达式,也可以利用小括号将正则表达式分组,利用groups接受所有分组值,在编写正则表达式的时候要注意转义字符。
import re
batRegex = re.compile(r'Bat(man|mobile|copter|bat)')
mo = batRegex.search('Batmobile lost Batman')
print(mo.group())
print(mo.group(1))
import re
batRegex = re.compile(r'Bat(wo)*man')
mo1 = batRegex.search('The Adventures of Batman')
print(mo1.group())
mo2 = batRegex.search('The Adventures of Batwowoman')
print(mo2.group())
import re
batRegex = re.compile(r'Bat(wo)+man')
mo1 = batRegex.search('The Adventures of Batman')
print(mo1.group())
mo2 = batRegex.search('The Adventures of Batwowoman')
print(mo2.group())
#######
import re
batRegex = re.compile(r'Bat(wo)+man')
mo1 = batRegex.search('The Adventures of Batwoman')
print(mo1.group())
mo2 = batRegex.search('The Adventures of Batwowoman')
print(mo2.group())
import re
phoneNumRegex = re.compile(r'\d\d\d-\d\d\d-\d\d\d\d')
mo = phoneNumRegex.search('Cell:415-555-9999 Work:215-555-9999')
print(mo.group())
print(phoneNumRegex.findall('Cell:415-555-9999 Work:215-555-9999'))
phoneNumRegex = re.compile(r'(\d\d\d)-(\d\d\d)-(\d\d\d\d)')
print(phoneNumRegex.findall('Cell:415-555-9999 Work:215-555-9999'))
import re
atRegex = re.compile(r'.at')
print(atRegex.findall('The cat in the hat sat on the flat mat.'))
nameRegex = re.compile(r'First name: (.*) Last name:(.*)')
mo = nameRegex.search('First name: AI Last name:Sweigart')
print(mo.groups())
print(mo.groups(0))
print(mo.group(1))
”.*“默认使用贪心模式。总是匹配所有字符串。此时仍然可以使用”?“来解决这个问题
import re
nongreedyRegex = re.compile(r'<.*?>')
mo = nongreedyRegex.search('<To serve man> for dinner.')
print(mo.group())
greedyRegex = re.compile(r'<.*>')
mo = greedyRegex.search('<To serve man> for dinner.>')
print(mo.group())