前言

        Whisper 是一种通用语音识别模型。它是在大量不同音频数据集上进行训练的,也是一个多任务模型,可以执行多语言语音识别、语音翻译和语言识别。

        这里呢,我将给出我的一些代码,来帮助你尽快实现【语音转文字】的服务部署。

        以下是该AI模块的具体使用方式:

        https://github.com/openai/whisper

心得

        这是一个不错的语言模型,它支持自动识别语音语种,类似中文、英文、日语等它都能胜任,并且可以实现其他语种转英语翻译的功能,支持附加时间戳的字幕导出功能......

        总体来说,它甚至可以与市面上领头的语言识别功能相媲美,并且主要它是开源的。

        这是它的一些模型大小、需要的GPU显存、相对执行速度的对应表

         这是它在命令行模式下的使用方式,这对想要尝尝鲜的小伙伴们来说,已经够了

        tips:

        1、首次安装完毕whisper后,执行指令时会给你安装你所选的模型,small、medium等,我的显卡已经不支持我使用medium了 

        2、关于GPU版本的pytorch,可以参考如下教程(使用CPU版本会比较慢)

        https://blog.csdn.net/G541788_/article/details/135437236

python调用 

        作为一名python从业者,我十分幸运能够读懂一些模块的相关使用,这里我通过修改了一些模块源码调用,实现了在python代码中一键导出语音字幕的功能(这些功能在命令行中已拥有,但是我希望在使用python脚本model方法后再实现该功能,可能这些你并不需要,但随意吧)。

        这个模块的cli()方法或许能更好实现这一功能(因为命令行模式,其实就是运行了这个方法,但我根据经验和实际代码来看,这会重复加载model,导致不必要的资源损耗)。

         1、__init__.py中加入get_writer,让你能通过whisper模块去使用这个方法

from .transcribe import get_writer

        2、相关功能代码

import os.path

import whisper

import time

# 这是语种langue参数的解释,或许对你的选择有帮助

LANGUAGES = {

"en": "english",

"zh": "chinese",

"de": "german",

"es": "spanish",

"ru": "russian",

"ko": "korean",

"fr": "french",

"ja": "japanese",

"pt": "portuguese",

"tr": "turkish",

"pl": "polish",

"ca": "catalan",

"nl": "dutch",

"ar": "arabic",

"sv": "swedish",

"it": "italian",

"id": "indonesian",

"hi": "hindi",

"fi": "finnish",

"vi": "vietnamese",

"he": "hebrew",

"uk": "ukrainian",

"el": "greek",

"ms": "malay",

"cs": "czech",

"ro": "romanian",

"da": "danish",

"hu": "hungarian",

"ta": "tamil",

"no": "norwegian",

"th": "thai",

"ur": "urdu",

"hr": "croatian",

"bg": "bulgarian",

"lt": "lithuanian",

"la": "latin",

"mi": "maori",

"ml": "malayalam",

"cy": "welsh",

"sk": "slovak",

"te": "telugu",

"fa": "persian",

"lv": "latvian",

"bn": "bengali",

"sr": "serbian",

"az": "azerbaijani",

"sl": "slovenian",

"kn": "kannada",

"et": "estonian",

"mk": "macedonian",

"br": "breton",

"eu": "basque",

"is": "icelandic",

"hy": "armenian",

"ne": "nepali",

"mn": "mongolian",

"bs": "bosnian",

"kk": "kazakh",

"sq": "albanian",

"sw": "swahili",

"gl": "galician",

"mr": "marathi",

"pa": "punjabi",

"si": "sinhala",

"km": "khmer",

"sn": "shona",

"yo": "yoruba",

"so": "somali",

"af": "afrikaans",

"oc": "occitan",

"ka": "georgian",

"be": "belarusian",

"tg": "tajik",

"sd": "sindhi",

"gu": "gujarati",

"am": "amharic",

"yi": "yiddish",

"lo": "lao",

"uz": "uzbek",

"fo": "faroese",

"ht": "haitian creole",

"ps": "pashto",

"tk": "turkmen",

"nn": "nynorsk",

"mt": "maltese",

"sa": "sanskrit",

"lb": "luxembourgish",

"my": "myanmar",

"bo": "tibetan",

"tl": "tagalog",

"mg": "malagasy",

"as": "assamese",

"tt": "tatar",

"haw": "hawaiian",

"ln": "lingala",

"ha": "hausa",

"ba": "bashkir",

"jw": "javanese",

"su": "sundanese",

"yue": "cantonese",

}

# 以下命令将使用medium模型转录音频文件中的语音:

#

# whisper audio.flac audio.mp3 audio.wav --model medium

# 默认设置(选择模型small)非常适合转录英语。要转录包含非英语语音的音频文件,您可以使用以下选项指定语言--language:

#

# whisper japanese.wav --language Japanese

# 添加--task translate会将演讲翻译成英语:

#

# whisper japanese.wav --language Japanese --task translate

# 其他语言转录为英语

# whisper "E:\voice\恋愛サーキュレーション_(Vocals)_(Vocals).wav" --language ja --task translate

# 这个任务是将audio_files内的声音文件进行字幕导出,以时间戳为单位存储到captions/目录里

audio_files = [r"E:\voice\恋愛サーキュレーション_(Vocals)_(Vocals).wav"]

model = whisper.load_model("small")

output_format = 'all'

writer_args = {

"highlight_words": False,

"max_line_count": None,

"max_line_width": None,

"max_words_per_line": None,

}

for audio_file in audio_files:

now_timestamp = str(int(time.time()))

save_path = f'captions/{now_timestamp}'

if not os.path.exists(save_path):

os.mkdir(save_path)

# language可选

# 中文zh,日语ja,英语en

result = model.transcribe(audio_file, language='ja')

writer = whisper.get_writer(output_format, save_path)

writer(result, audio_file, **writer_args)

print('done: ', audio_file )

精彩链接

评论可见,请评论后查看内容,谢谢!!!
 您阅读本篇文章共花了: