工作任务:要把字幕srt文档进行拆分,把数字1和16之间的提取出来,然后转成纯文本文档;
你是一个Python编程专家,要完成一个Python脚本编写任务,具体步骤如下:
读取srt文档里面的每一行:"D:\My.Neighbor.Totoro.1988.720p.BluRay.X264-AMIABLE [PublicHD]\My.Neighbor.Totoro.1988.720p.BluRay.X264-AMIABLE.srt"
定位到内容为数字的那些行;
然后提取数字“{hangnumber1}”和数字“{hangnumber1}+15”之间的文本内容({hangnumber1}从1开始,以15递增,直到608结束),删除掉所有时间轴和数字行,然后把处理后的文本内容保存为txt文档,保存在文件夹”D:\My.Neighbor.Totoro.1988.720p.BluRay.X264-AMIABLE [PublicHD]\Subs”,txt文档的格式为txt,文件名为:数字“{hangnumber1}”和数字“{hangnumber1}+15”之间的中间数;
注意:每一步都要输出信息到屏幕
注意:每一步都要输出信息到屏幕
ChatGPT生成了代码,但是运行时有一点小问题,多提取了一行内容,让ChatGPT修改:
下面是srt文档内容:
###
1
00:02:32,174 --> 00:02:34,343
Dad, do you want some candy?
2
00:02:34,426 --> 00:02:36,887
Thanks. Aren't you tired?
3
00:02:36,970 --> 00:02:38,889
- No.
- We're almost there.
4
00:02:50,609 --> 00:02:51,735
Mei, hide!
5
00:02:57,658 --> 00:02:59,076
It wasn't a policeman.
6
00:02:59,326 --> 00:03:00,869
Hello!
7
00:03:31,984 --> 00:03:34,611
Hello. Are your parents around?
8
00:03:36,863 --> 00:03:37,948
Thanks.
9
00:03:40,284 --> 00:03:43,954
I'm Kusakabe! We've just arrived!
10
00:03:45,038 --> 00:03:46,873
A pleasure to meet you!
11
00:03:47,541 --> 00:03:49,793
Welcome!
12
00:03:54,298 --> 00:03:55,382
Thank you.
13
00:04:17,696 --> 00:04:18,822
Here we are!
14
00:04:20,407 --> 00:04:21,658
Wait!
15
00:04:24,578 --> 00:04:27,122
- Mei, there's a bridge!
- A bridge?
16
00:04:32,461 --> 00:04:34,504
See that flash? Fish too!
17
00:04:35,672 --> 00:04:37,090
So, you like it here?
###
你把数字“16”和数字“17”之间的内容“See that flash? Fish too!”也提取到了数字“1”和数字“16”之间,多提取了内容,请修正这个错误。
修改后的源代码:
import os
def extract_and_save_text(srt_file, output_folder, start_line):
"""
提取 SRT 文件中指定范围的行,并保存为 TXT 文件。
:param srt_file: SRT 文件路径
:param output_folder: 输出文件夹路径
:param start_line: 起始行号
"""
end_line = start_line + 15
middle_number = (start_line + end_line) // 2
output_file = os.path.join(output_folder, f"{middle_number}.txt")
with open(srt_file, 'r', encoding='utf-8') as file:
lines = file.readlines()
text_lines = []
current_number = None
extract = False
for line in lines:
line = line.strip()
if line.isdigit():
current_number = int(line)
if start_line <= current_number < end_line:
extract = True
else:
extract = False
elif extract and '-->' not in line:
text_lines.append(line)
with open(output_file, 'w', encoding='utf-8') as file:
file.write("\n".join(text_lines))
print(f"Extracted lines {start_line}-{end_line - 1} and saved to {output_file}")
def main():
srt_file = r"D:\My.Neighbor.Totoro.1988.720p.BluRay.X264-AMIABLE [PublicHD]\My.Neighbor.Totoro.1988.720p.BluRay.X264-AMIABLE.srt"
output_folder = r"D:\My.Neighbor.Totoro.1988.720p.BluRay.X264-AMIABLE [PublicHD]\Subs"
# 确保输出文件夹存在
os.makedirs(output_folder, exist_ok=True)
# 行号范围设定
for start_line in range(1, 609, 15): # 从 1 到 608,步长为 15
extract_and_save_text(srt_file, output_folder, start_line)
if __name__ == "__main__":
main()