ubuntu下faster-whisper安装、基于faster-whisper的语音识别示例、同步生成srt字幕文件

文章目录

前言一、faster-whisper的安装1.docker及nvidia-docker安装2.镜像下载3.启动容器3.容器中创建用户，安装anaconda

二、基于faster-whisper的语音识别1.将cuda 和 nvidia加入到dl的环境变量中2.安装faster-whisper3.模型下载4.启动jupyter notebook 测试是否安装成功

三、转srt字幕文件

前言

上一篇某站视频、音频集合批量下载写了如何下载某站的音频和视频文件，这一篇主要讲解记录一下基于faster-whisper的语音识别怎么做，不包含理论部分，主要包括以下三部分 1）faster-whisper的安装 2）基于faster-whisper的语音识别 3）转srt字幕文件

一、faster-whisper的安装

1.docker及nvidia-docker安装

见ubuntu20.04下nvidia驱动安装，docker/nvidia-docker安装

2.镜像下载

docker pull nvidia/cuda:11.3.1-cudnn8-devel-ubuntu20.04

3.启动容器

docker run -itd --name=faster-whispter-demo --net=host --gpus all --shm-size=16g nvidia/cuda:11.3.1-cudnn8-devel-ubuntu20.04 bash

docker exec -it faster-whispter-demo bash

3.容器中创建用户，安装anaconda

主要是容器中为root用户，有时候文件夹映射后在主机访问文件总要修改权限比较麻烦，可以不添加用户，但是基础软件可以按需安装

# 在容器中

#添加dl用户

useradd -ms /bin/bash dl

# 设置dl的密码

passwd dl

New password:

Retype new password:

passwd: password updated successfully

# 修改/home/dl文件夹权限

chmod -R o+wrx /home/dl

# 安装一些基础软件

apt-get update

apt-get install vim

apt-get install sudo

# 给dl赋予sudo权限

chmod +wrx /etc/sudoers

vi /etc/sudoers

# 在root下面加dl 所有权，抄root的

#切换到dl用户

su dl

sudo apt-get install wget

sudo apt-get install git

# 下载anaconda，也可以提前下载好拷贝到容器里

cd /home/dl/

wget https://repo.anaconda.com/archive/Anaconda3-2023.09-0-Linux-x86_64.sh

chmod +x ./Anaconda3-2023.09-0-Linux-x86_64.sh

./Anaconda3-2023.09-0-Linux-x86_64.sh

### 下面是anaconda安装中一些输入ENTER->yes->ENTER->yes

Enter

/yes

Do you accept the license terms? [yes|no]

[no] >>> yes

Anaconda3 will now be installed into this location:

/home/dl/anaconda3

- Press ENTER to confirm the location

- Press CTRL-C to abort the installation

- Or specify a different location below

[/home/dl/anaconda3] >>>

installation finished.

Do you wish to update your shell profile to automatically initialize conda?

This will activate conda on startup and change the command prompt when activated.

If you'd prefer that conda's base environment not be activated on startup,

run the following command when conda is activated:

conda config --set auto_activate_base false

You can undo this by running `conda init --reverse $SHELL`? [yes|no]

[no] >>> yes

### anaconda安装成功

#使anaconda生效

source ~/.bashrc

#测试jupyter notebook

jupyter notebook

#复制连接到浏览器看看

二、基于faster-whisper的语音识别

1.将cuda 和 nvidia加入到dl的环境变量中

# 此时还在容器中，切换到root用户

exit

# 修改权限

chmod -R o+wrx /usr/local/

echo $PATH

#/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin

echo $LD_LIBRARY_PATH

#/usr/local/nvidia/lib:/usr/local/nvidia/lib64

#切换到dl用户,将上面root用户里的PATH和LD_LIBRARY_PATH加入到dl的环境变量中

su dl

vi ~/.bashrc

export PATH=/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:$PATH

export LD_LIBRARY_PATH=/usr/local/nvidia/lib:/usr/local/nvidia/lib64:$LD_LIBRARY_PATH

3使生效

source ~/.bashrc

# 安装nvidia-cublas-cu11 nvidia-cudnn-cu11

pip install -i https://pypi.tuna.tsinghua.edu.cn/simple nvidia-cublas-cu11 nvidia-cudnn-cu11

2.安装faster-whisper

# https://github.com/SYSTRAN/faster-whisper，由于不能科学上网，从gitee上找的镜像

git clone https://gitee.com/loocen/faster-whisper

cd faster-whisper/

pip install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple

python setup.py install

3.模型下载

由于原官方模型需要科学上网才能下载，这里找的是镜像，需要手动下载

# 以large-v3模型举例模型下载，在faster-whisper文件夹下创建一个存放模型的目录

mkdir -p /homw/dl/faster-whisper/model/faster-whisper-large-v3

cd /homw/dl/faster-whisper/model/faster-whisper-large-v3

wget https://hf-mirror.com/Systran/faster-whisper-large-v3/resolve/main/model.bin?download=true -O model.bin

wget https://hf-mirror.com/Systran/faster-whisper-large-v3/resolve/main/README.md?download=true -O README.md

wget https://hf-mirror.com/Systran/faster-whisper-large-v3/resolve/main/config.json?download=true -O config.json

wget https://hf-mirror.com/Systran/faster-whisper-large-v3/resolve/main/preprocessor_config.json?download=true -O preprocessor_config.json

wget https://hf-mirror.com/Systran/faster-whisper-large-v3/resolve/main/tokenizer.json?download=true -O tokenizer.json

wget https://hf-mirror.com/Systran/faster-whisper-large-v3/resolve/main/vocabulary.json?download=true -O vocabulary.json

###各模型地址

large-v3模型：https://hf-mirror.com/Systran/faster-whisper-large-v3/tree/main

large-v2模型：https://hf-mirror.com/guillaumekln/faster-whisper-large-v2/tree/main

large-v2模型：https://hf-mirror.com/guillaumekln/faster-whisper-large-v1/tree/main

medium模型：https://hf-mirror.com/guillaumekln/faster-whisper-medium/tree/main

small模型：https://hf-mirror.com/guillaumekln/faster-whisper-small/tree/main

base模型：https://hf-mirror.com/guillaumekln/faster-whisper-base/tree/main

tiny模型：https://hf-mirror.com/guillaumekln/faster-whisper-tiny/tree/main

4.启动jupyter notebook 测试是否安装成功

cd /home/dl

jupyter notebook

# 复制链接到浏览器中

# new一个notebook,选python3

在notebook中复制下面的代码测试

from faster_whisper import WhisperModel, decode_audio

# model_size = "large-v3"

model_path='/home/dl/faster-whisper/model/faster-whisper-large-v3'

# Run on GPU with FP16

model = WhisperModel(model_path, device="cuda", compute_type="float32")

# test1

audio_path='/home/dl/faster-whisper/tests/data/stereo_diarization.wav'

left, right = decode_audio(audio_path, split_stereo=True)

segments, _ = model.transcribe(left)

transcription = "".join(segment.text for segment in segments).strip()

assert transcription == (

"He began a confused complaint against the wizard, "

"who had vanished behind the curtain on the left."

)

print(transcription)

segments, _ = model.transcribe(right)

transcription = "".join(segment.text for segment in segments).strip()

assert transcription == "The horizon seems extremely distant."

# test2

audio='/home/dl/faster-whisper/tests/data/jfk.flac'

segments, info = model.transcribe(audio, beam_size=5)

print("Detected language '%s' with probability %f" % (info.language, info.language_probability))

for segment in segments:

print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text))

三、转srt字幕文件

# 新开一个命令行，进入容器

docker exec -it faster-whispter-demo bash

su dl

pip install -i https://pypi.tuna.tsinghua.edu.cn/simple python-docx

pip install -i https://pypi.tuna.tsinghua.edu.cn/simple pysubs2

mkdir /home/dl/faster-whisper/mp3

新开一个命令行执行类似如下命令，containerid替换成faster-whispter-demo对应的容器id，将宿主机中要语音识别的mp3文件复制到容器中

docker cp mp3 containerid:/home/dl/faster-whisper/mp3

将下面的代码拷贝到二-》4中的notebook中，执行一下，会生成srt，docx和txt文件。有一个问题还没有解决，标点符号修改和分段。这个后期再研究

import math

from docx import Document

import pysubs2

from dataclasses import dataclass

import os

@dataclass

class DownloadInfo:

base_url: str

max_episod: int

save_dir: str

#zhuan_2022 = DownloadInfo('https://www.xxxxxxx.com/video/BV1ct4y1871B',99,'专/2022-专')

#shiwu_2022 = DownloadInfo('https://www.xxxxxxx.com/video/BV1ET411375o',93,'专/2022-实')

#xiangguan_2022 = DownloadInfo('https://www.xxxxxxx.com/video/BV1pZ4y1e77d',84,'专/2022-相')

#fachongci_2022 = DownloadInfo('https://www.xxxxxxx.com/video/BV1E8411W7JD',50,'专/2022-法冲刺')

shiwu_2023 = DownloadInfo('https://www.xxxxxxx.com/video/BV1Kc411L7dQ',1,'mp3')

current_down = shiwu_2023

base_dir = '/home/dl/faster-whisper/'

for audio_i in range(1,current_down.max_episod+1):

audio= os.path.join(base_dir,current_down.save_dir,"{}.mp3".format(audio_i))

print(audio)

segments, info = model.transcribe(audio, beam_size=5)

srt_path = os.path.join(base_dir,current_down.save_dir,"{}.srt".format(audio_i))

doc_path = os.path.join(base_dir,current_down.save_dir,"{}.docx".format(audio_i))

txt_path = os.path.join(base_dir,current_down.save_dir,"{}.txt".format(audio_i))

print("Detected language '%s' with probability %f" % (info.language, info.language_probability))

with open(srt_path,'w') as f:

for segment in segments:

sentence_timestamp = []

for p_time in [segment.start, segment.end]:

m,s = divmod(p_time, 60)

h,m = divmod(m, 60)

ms, s = math.modf(s)

sentence_timestamp.append((int(h), int(m), int(s), int(ms * 1000)))

line = '{}\n{:0>2d}:{:0>2d}:{:0>2d},{:0>3d} --> {:0>2d}:{:0>2d}:{:0>2d},{:0>3d}\n{}\n\n'.format(

segment.id,sentence_timestamp[0][0],sentence_timestamp[0][1],sentence_timestamp[0][2],sentence_timestamp[0][3],sentence_timestamp[1][0],sentence_timestamp[1][1],sentence_timestamp[1][2],sentence_timestamp[1][3],segment.text)

# print(line)

f.write(line)

# print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text))

doc_zhuan = Document()

subtitles = pysubs2.load(srt_path)

save_text = ''

for sub in subtitles:

sub_text = sub.text

if sub_text.find(',') == -1:

sub_text = sub.text+','

save_text = save_text + sub_text

# print(save_text)

doc_zhuan.add_paragraph(save_text)

doc_zhuan.save(doc_path)

with open(txt_path,'w') as f_txt:

f_txt.write(save_text)

好文推荐

评论可见，请评论后查看内容，谢谢！！！

您阅读本篇文章共花了：

金钥匙

ubuntu下faster-whisper安装、基于faster-whisper的语音识别示例、同步生成srt字幕文件

javascript 前端开发语言 webpack5使用CoreJs解决js代码兼容性

whisper、whisper.cpp、faster-whisper的比较

发表评论取消回复

金钥匙

ubuntu下faster-whisper安装、基于faster-whisper的语音识别示例、同步生成srt字幕文件

javascript 前端 开发语言 webpack5使用CoreJs解决js代码兼容性

whisper、whisper.cpp、faster-whisper的比较

相关文章

发表评论取消回复

javascript 前端开发语言 webpack5使用CoreJs解决js代码兼容性