模块目的:通过调整每行文本的断句位置来对段落文本进行格式化编排。
在一些需要文本美观输出(打印)的场景中,textwrap
模块可以用来格式化编排文本。它提供了类似于在许多文本编辑器和文字处理器中使用的段落包装、填充等程序化功能。
样例数据
本节的样例我们建立textwrap_example.py
模块,其包含一个多行字符串sample_text
。
# textwrap_example.py
sample_text = '''
The textwrap module can be used to format text for output in
situations where pretty-printing is desired. It offers
programmatic functionality similar to the paragraph wrapping
or filling features found in many text editors.
'''
填充段落(Fill)
fill()
方法使用文本作为输入,并返回格式化之后的文本。
# textwrap_fill.py
import textwrap
from textwrap_example import sample_text
print(textwrap.fill(sample_text, width=50))
结果并不令人满意。文本现在被左对齐,但是第一行文本保留了其行首缩进,而其余各行行首的空白都被嵌入到了段落中间。
$ python3 textwrap_fill.py
The textwrap module can be used to format
text for output in situations where pretty-
printing is desired. It offers programmatic
functionality similar to the paragraph wrapping
or filling features found in many text editors.
去除“空白”(Dedent)
在上个例子中,我们的格式化文本输出中间混杂着一些制表符和多余的空白,所以它看起来并不美观。dedent()
方法可以去除样例字符串中每一行文本行首的共有空白,这样可以使结果看起来更美观。样例字符串是为了说明该特性人为的加上的空白。
# textwrap_dedent.py
import textwrap
from textwrap_example import sample_text
dedented_text = textwrap.dedent(sample_text)
print('Dedented:')
print(dedented_text)
结果开始变得美观起来:
$ python3 textwrap_dedent.py
Dedented:
The textwrap module can be used to format text for output in
situations where pretty-printing is desired. It offers
programmatic functionality similar to the paragraph wrapping
or filling features found in many text editors.
dedent()
(去除缩进)是”indent”(缩进/空白)的对立面,dedent()
方法的结果就是每一行文本行首的共有空白被去除了。但是如果有一行文本本身比其他行多一些空白,那么这些多出来的空白将不会被去除。
比如,我们用下划线_
代替空白,输入:
_Line one.
__Line two.
_Line Three.
那么输出结果是:
Line one.
_Line two.
Line Three.
Dedent和Fill结合使用
接下来,去除行首空白之后的文本可以传递给fill()
方法,并使用不同的width
参数值来测试:
# textwrap_fill_width.py
import textwrap
from textwrap_example import sample_text
dedented_text = textwrap.dedent(sample_text).strip()
for width in [45, 60]:
print('{} Columns: \n'.format(width))
print(textwrap.fill(dedented_text, width=width))
print()
结果如下:
$ python3 textwrap_fill_width.py
45 Columns:
The textwrap module can be used to format
text for output in situations where pretty-
printing is desired. It offers programmatic
functionality similar to the paragraph
wrapping or filling features found in many
text editors.
60 Columns:
The textwrap module can be used to format text for output in
situations where pretty-printing is desired. It offers
programmatic functionality similar to the paragraph wrapping
or filling features found in many text editors.
添加缩进文本
使用indent()
方法可以在一个多行字符串的每一行行首添加一致的前缀文本。下述例子在一个样例字符串的每一行行首添加>
前缀,使其变成邮件回复中被引用的格式。
# textwrap_indent.py
import textwrap
from textwrap_example import sample_text
dedented_text = textwrap.dedent(sample_text)
wrapped = textwrap.fill(dedented_text, width=50)
wrapped += '\n\nSecond paragraph after a blank line.'
final = textwrap.indent(wrapped, '> ')
print('Quoted block:\n')
print(final)
样例段落被分割成新的每一行,并在每一行前面加上前缀>
,接着这些行组成一个新的字符串并返回。
$ python3 textwrap_indent.py
Quoted block:
> The textwrap module can be used to format text
> for output in situations where pretty-printing is
> desired. It offers programmatic functionality
> similar to the paragraph wrapping or filling
> features found in many text editors.
> Second paragraph after a blank line.
如果我们想要控制给哪些行添加前缀,可以给indent()
方法传递predicate
断言参数,该参数是一个方法,对每一行文本,indent()
方法先调用该方法进行判断,如果该方法返回True
,则在这一行前面添加前缀,否则不添加。
# textwrap_indent_predicate.py
import textwrap
from textwrap_example import sample_text
def should_indent(line):
print('Indent {!r}?'.format(line))
return len(line.strip()) % 2 == 0
dedented_text = textwrap.dedent(sample_text)
wrapped = textwrap.fill(dedented_text, width=50)
final = textwrap.indent(wrapped, 'EVEN ', predicate=should_indent)
print('\nQuoted block:\n')
print(final)
结果是我们只给长度为偶数的每一行添加了前缀:
$ python3 textwrap_indent_predicate.py
Indent ' The textwrap module can be used to format text\n'?
Indent 'for output in situations where pretty-printing is\n'?
Indent 'desired. It offers programmatic functionality\n'?
Indent 'similar to the paragraph wrapping or filling\n'?
Indent 'features found in many text editors.'?
Quoted block:
EVEN The textwrap module can be used to format text
for output in situations where pretty-printing is
EVEN desired. It offers programmatic functionality
EVEN similar to the paragraph wrapping or filling
EVEN features found in many text editors.
凸排(段落内缩)
我们也可以使用fill()
方法实现添加前缀,同样的,我们可以设置输出的宽度,并且第一行文本的前缀文本可以单独设置。
# textwrap_hanging_indent.py
import textwrap
from textwrap_example import sample_text
dedented_text = textwrap.dedent(sample_text)
print(textwrap.fill(dedented_text, initial_indent='', subsequent_indent=' ' * 4, width=50))
这样可以很容易产生一段“凸排”文字,即第一行文本的缩进比其他行少。
$ python3 textwrap_hanging_indent.py
The textwrap module can be used to format text for
output in situations where pretty-printing is
desired. It offers programmatic functionality
similar to the paragraph wrapping or filling
features found in many text editors.
前缀文本也可以是非空白字符,比如可以用星号*
,这样就可以产生一段条列要点。
截断长字符串
我们可以使用shorten()
方法来截断较长的字符串以此来产生一段摘要或概述。所有的空白字符,比如制表符、换行符、成串的空格都会被替换成一个空格。文本会以少于或等于所要求的文本长度而截断,截断的地方都在单词边界以避免不完整单词的出现。
# textwrap_shorten.py
import textwrap
from textwrap_example import sample_text
dedented_text = textwrap.dedent(sample_text)
original = textwrap.fill(dedented_text, width=50)
print('Original:\n')
print(original)
shortened = textwrap.shorten(original, 100)
shortened_wrapped = textwrap.fill(shortened, width=50)
print('\nShortened:\n')
print(shortened_wrapped)
如果原始字符串中的非空白字符被去除,那么它会被一个占位符代替。默认的占位符是[...]
,它可以通过给shorten()
方法传递一个placeholder
参数来设置。
$ python3 textwrap_shorten.py
Original:
The textwrap module can be used to format text
for output in situations where pretty-printing is
desired. It offers programmatic functionality
similar to the paragraph wrapping or filling
features found in many text editors.
Shortened:
The textwrap module can be used to format text for
output in situations where pretty-printing [...]
参考:
1.textwrap模块的官方文档