人工智能【语音识别】落地实现--离线智能语音助手

文章目录

一、休眠和语音唤醒1.环境配置2.如何定制训练自己的唤醒词3.落地实现--运行在ARM-Linux板上--交叉编译

二、学会听1.基本的环境配置，后面的缺什么就 pip 什么2.实现语音的接收并保存为wav文件3.接收好中文语音的wav文件读取转化为文本输出

三、学会聊天1.环境配置：2.尝试训练官方的中文数据集并使用3.我提供一个数据集：4.进行数学运算和时间查询：

四、学会控制智能家电1.环境配置:2.实现python控制arduino

五、学会说1.环境配置：2.代码实现

最后所有的源码其他

参考：基于python和深度学习（语音识别、NLP）实现本地离线智能语音控制终端（带聊天功能和家居控制功能）

基于V3S的语音助手（三）移植pocketsphnix关键词唤醒基于V3S的语音助手（二）移植pyaudio到开发板

基于V3S的语音助手（一）python3的编译和安装（该版本解决zlib readline可以使用pip）

整体的开发逻辑是：在自己的电脑上实现功能，再移植到嵌入式Linux系统上去，主要的实现方式就是相关的软件都要进行交叉编译。

一、休眠和语音唤醒

这里我以pocketsphinx实现语音唤醒为例子：

1.环境配置

pip install pocketsphinx

pip install pyaudio

如果这样安装失败的，就需要我们到网上下载好指定whl文件，再进行离线安装。

https://www.lfd.uci.edu/~gohlke/pythonlibs/

到上面的那个网站找到 pocketsphinx 和 pyaudio 的whl文件，这里要注意选择对应你的操作系统

和 python 的版本，下载后放到项目文件夹下，在pycharm的终端中输入：

pip install 包的全名

进行本地离线安装。

比如：我在window上python3.7 进行离线安装pocketsphinx,这里要注意文件的路径，否则会报错找不到指定文件

pip install pocketsphinx-0.1.15-cp37-cp37m-win_amd64.whl

2.如何定制训练自己的唤醒词

建立一个名为keyword的文件，里面输入你想要的唤醒词，以及读音相近的词（相似读音的词越多灵敏度越高），比如：我想训练的终端名为 COCO,则我的keyword内容为：

gogo

coco

yoyo

bobo

lolo

momo

nono

hoho

打开网站：Sphinx Knowledge Base Tool VERSION 3

选择 keyword.txt 进行上传，得到对应的压缩包，下载后放到项目文件夹下，并解压得：下载下来以后，压缩包里面文件：

这里的数字都是网站随机生成的，不一样是正常的。

测试代码：

import os

from pocketsphinx import LiveSpeech, get_model_path

def wakeup_co():

model_path = get_model_path()

speech = LiveSpeech(

verbose=False,

sampling_rate=16000,

buffer_size=2048,

no_search=False,

full_utt=False,

hmm=os.path.join(model_path, 'en-us'),

lm=os.path.join('.\\Sphinx_keyword\\keyword_COCO\\', '5995.lm'),

dic=os.path.join('.\\Sphinx_keyword\\keyword_COCO\\', '5995.dic')

)

for phrase in speech:

#print("phrase:", phrase)

#print(phrase.segments(detailed=True))

if str(phrase) in ["GOGO", "COCO", "YOYO",

"BOBO", "LOLO", "MOMO",

"NONO", "HOHO"]:

print('我是COCO')

这里 ‘.\Sphinx_keyword\keyword_COCO\’ 是我存放文件的地址，5995 也需要根据你的文件来更改，需要根据你的地址来更改。

测试结果：

Allocating 32 buffers of 2500 samples each

我是COCO

进程已结束，退出代码为 0

若想使用中文词作为唤醒词，则需要下载好相关的中文文件：

Download CMU Sphinx from SourceForge.net

下载后放到项目文件夹中解压，得：cmusphinx-zh-cn-5.2 文件夹

和训练英文唤醒词一样，需要建立 keyword.txt 文件，我训练了一个名为“佩佩”的唤醒词：

贝贝

佩佩

蕾蕾

内内

嘿嘿

忒忒

得得

贼贼

尅尅

打开网站：Sphinx Knowledge Base Tool VERSION 3

上传keyword.txt后得到一个压缩包，下载后放到项目路径下并解压。

这里需要更改 dic 为后缀的文件内容，需要在中文后面按格式添加上拼音和声调，所有间隔均为一个空格，修改后，例如：

佩佩 p ei4 p ei3

内内 n ei4 n ei3

嘿嘿 h ei4 h ei3

尅尅 k ei4 k ei3

得得 d ei4 d ei3

忒忒 t ei4 t ei3

蕾蕾 l ei4 l ei3

贝贝 b ei4 b ei3

贼贼 z ei4 z ei3

测试代码：

import os

from pocketsphinx import LiveSpeech, get_model_path

model_path = '.\\Sphinx_keyword\\cmusphinx-zh-cn-5.2\\'

speech = LiveSpeech(

verbose=False,

sampling_rate=16000,

buffer_size=2048,

no_search=False,

full_utt=False,

hmm=os.path.join(model_path ,'zh_cn.cd_cont_5000'),

lm=os.path.join('.\\Sphinx_keyword\\keyword_PeiPei\\', '0738.lm'),

dic=os.path.join('.\\Sphinx_keyword\\keyword_PeiPei\\', '0738.dic')

)

for phrase in speech:

print("phrase:", phrase)

print(phrase.segments(detailed=True))

if str(phrase) in ["贝贝", "佩佩", "蕾蕾",

"内内", "嘿嘿", "忒忒",

"得得", "贼贼", "尅尅",]:

print("我是佩佩")

这里 model_path 的路径需要指向解压后中文文件夹 cmusphinx-zh-cn-5.2 里面的 zh_cn.cd_cont_5000 文件夹。

‘.\Sphinx_keyword\keyword_PeiPei\’ 和 0738 需要根据你的路径和文件名来修改指向你下载解压好的文件。

测试结果:

Allocating 32 buffers of 2500 samples each

phrase: 尅尅

[('~~', 0, 4359242, 4359325), ('', -1331, 4359326, 4359479), ('', -1331, 4359480, 4359801), ('', -1331, 4359802, 4359940), ('尅尅', 0, 4359941, 4360080)]~~

~~我是佩佩~~

~~phrase: 得得~~

~~[('~~', 0, 8516377, 8516528), ('', -5375, 8516529, 8516683), ('', -2035, 8516684, 8516764), ('得得', 0, 8516765, 8516968), ('~~', 0, 8516969, 8516980)]~~

~~我是佩佩~~

~~phrase: 嘿嘿~~

~~[('~~', 0, 10674834, 10675304), ('嘿嘿', -3628, 10675305, 10675382), ('~~', 0, 10675383, 10675385)]~~

~~我是佩佩~~

~~中英两种唤醒词使用一个就可以了，英文版的比较灵敏，中文版的可能比较迟钝，所以这里我还是推荐训练使用英文版的，快捷且灵敏。~~

~~3.落地实现–运行在ARM-Linux板上–交叉编译~~

~~前面使用pip安装的pocketsphinx 和 pyaudio 要想跑在ARM板上需要使用ARM板对应的交叉编译器进行交叉编译。~~

~~二、学会听~~

~~这里我基于pytorch的speechbrain的预训练模型，训练得出一个中文语音识别系统，~~

~~可以实现中文语音转转化为文字输出。~~

~~1.基本的环境配置，后面的缺什么就 pip 什么~~

~~pip install speechbrain~~

~~pip install SoundFile~~

~~pip install sox~~

~~pip install speech_recognition~~

~~2.实现语音的接收并保存为wav文件~~

~~这里基于speech_recognition包进行实现。~~

~~在读取语音等到没有语音输入时，自动停止并保存。~~

~~测试代码：~~

~~import speech_recognition as sr #pyaudio SpeechRecognition模块~~

~~from myself_word_to_voice import speakout~~

~~def rec(rate=16000): #从系统麦克风拾取音频数据，采样率为 16000~~

~~r = sr.Recognizer()~~

~~with sr.Microphone(sample_rate=rate) as source:~~

~~sayword = 'coco在听呢'~~

~~print(sayword) #这里会打印please say something，提示你说话进行录音~~

~~audio = r.listen(source)~~

~~with open("recording.wav", "wb") as f: #把采集到的音频数据以 wav 格式保存在当前目录下的recording.wav 文件~~

~~f.write(audio.get_wav_data())~~

~~print('您说的我已经收到了')~~

~~return 1~~

~~rec()~~

~~测试结果：打开项目文件夹，直接双击打开recording.wav文件即可听到你说的话。~~

~~3.接收好中文语音的wav文件读取转化为文本输出~~

~~这里我给出官网和github的地址，各位可以自行尝试：~~

~~SpeechBrain: A PyTorch Speech Toolkit~~

~~GitHub - speechbrain/speechbrain: A PyTorch-based Speech Toolkit~~

~~这里我选择的是普通话，下载好预训练模型后，网站里也有教程，各位可以自行尝试训练使用。~~

~~这里我直接给出实现的码源和模型，配置好环境后可以直接使用。~~

~~SpeechBrain(中文语音识别).zip-深度学习文档类资源-CSDN下载~~

~~测试结果：~~

~~The torchaudio backend is switched to 'soundfile'. Note that 'sox_io' is not supported on Windows.~~

~~start...~~

~~有奶会有的面包括有的一切都会有的调解里数和语音可以达到更好的效果~~

~~进程已结束，退出代码为 0~~

~~三、学会聊天~~

~~这里我基于chatterbot和第三方的语义库，高度定制化地训练自己的机器人聊天对话系统~~

~~1.环境配置：~~

~~pip install chatterbot~~

~~pip install chatterbot_corpus~~

~~可能的报错：~~

~~OSError: [E053] Could not read config.cfg from C:\Users\pc\AppData\Local\Programs\Python\Python38\Lib\site-packages\en_core_web_md\en_core_web_md-2.2.5\config.cfg。~~

~~解决办法：~~

~~pip uninstall spacy~~

~~pip install -i https://pypi.douban.com/simple spacy==2.2.2~~

~~2.尝试训练官方的中文数据集并使用~~

~~from chatterbot import ChatBot~~

~~from chatterbot.trainers import ChatterBotCorpusTrainer~~

~~import logging~~

~~'''~~

~~This is an example showing how to train a chat bot using the~~

~~ChatterBot Corpus of conversation dialog.~~

~~'''~~

~~# Enable info level logging~~

~~# logging.basicConfig(level=logging.INFO)~~

~~chatbot = ChatBot('Example Bot')~~

~~# Start by training our bot with the ChatterBot corpus data~~

~~trainer = ChatterBotCorpusTrainer(chatbot)~~

~~def train():~~

~~trainer.train(~~

~~'chatterbot.corpus.chinese'~~

)

~~def chat(word = ''):~~

~~word = chatbot.get_response(word)~~

~~return word~~

~~def test1():~~

~~train()~~

~~while 1:~~

~~print(chatbot.get_response(input(">")))~~

~~test1()~~

~~若无报错则，可以继续下一步，训练自己的数据集，实现高度定制化。~~

~~3.我提供一个数据集：~~

~~corpus.txt用于训练自己的聊天机器人-深度学习文档类资源-CSDN下载~~

~~大概长这样：~~

~~下载后，在你的项目文件夹中建立一个名为 corpus 的文件夹，把下载好的 corpus.txt 放进去。~~

~~进行训练，代码：~~

~~from chatterbot import ChatBot~~

~~from chatterbot.trainers import ListTrainer~~

~~from chatterbot.trainers import ChatterBotCorpusTrainer~~

~~# 构建ChatBot并指定Adapter~~

~~my_bot = ChatBot(~~

~~'COCO',~~

~~storage_adapter='chatterbot.storage.SQLStorageAdapter',~~

~~logic_adapters=[~~

{

~~'import_path': 'chatterbot.logic.BestMatch',~~

~~'threshold': 0.65,#低于置信度，则默认回答~~

~~'default_response':'coco没听懂'~~

}

]

)

~~def train_myword():~~

~~file = open("./corpus/corpus.txt", 'r', encoding='utf-8')~~

~~corpus = []~~

~~print('开始加载语料！')~~

~~# 导入语料库~~

~~while 1:~~

~~try:~~

~~line = file.readline()~~

~~if not line:~~

~~break~~

~~if line == '===\n':~~

~~continue~~

~~temp = line.strip('\n')~~

~~# print(temp)~~

~~corpus.append(temp)~~

~~except:~~

~~pass~~

~~file.close()~~

~~print('语料加载完毕')~~

~~print('》'*30)~~

~~#my_bot = ChatBot("coco")~~

~~#my_bot.set_trainer(ListTrainer)~~

~~trainer = ListTrainer(my_bot)~~

~~print('开始训练！')~~

~~trainer.train(corpus[:10000])~~

~~print('训练完毕！')~~

~~def chat1():~~

~~while True:~~

~~print(my_bot.get_response(input("user:")))~~

~~def chat_my(word = ''):~~

~~word = my_bot.get_response(word)~~

~~return word~~

~~def test1():~~

~~train_myword()~~

~~chat1()~~

~~test1()~~

~~训练完毕：~~

~~开始加载语料！~~

~~语料加载完毕~~

~~》》》》》》》》》》》》》》》》》》》》》》》》》》》》》》~~

~~开始训练！~~

~~List Trainer: [####################] 100%~~

~~训练完毕！~~

这里我只训练语库的前10000条对话，建议不要训练太多的条对话，贪多嚼不烂，就算训练出来了也有很高的回复延迟，甚至直接无法运行回复，非常影响用户体验。

~~4.进行数学运算和时间查询：~~

~~# -*- coding: utf-8 -*-~~

~~from chatterbot import ChatBot~~

~~bot = ChatBot(~~

~~"Math & Time Bot",~~

~~logic_adapters=[~~

~~"chatterbot.logic.MathematicalEvaluation",~~

~~"chatterbot.logic.TimeLogicAdapter"~~

~~input_adapter="chatterbot.input.VariableInputTypeAdapter",~~

~~output_adapter="chatterbot.output.OutputAdapter"~~

)

~~def chot_math_time(text=''):~~

~~response = bot.get_response(text)~~

~~return response~~

~~print(chot_math_time('what is 1 + 1'))~~

~~print(chot_math_time('现在几点了'))~~

~~结果：~~

~~1 + 1 = 2~~

~~The current time is 05:32 PM~~

~~进程已结束，退出代码为 0~~

~~由于该模块只支持英文,我们使用中文语音时，需要将语音中的数字进行剥离，具体代码如下：~~

~~def Split_num_letters(astr):~~

~~nums = []~~

~~astr = astr +'无'~~

~~num1 = ''~~

~~for i in range(len(astr)-1):~~

~~if astr[i].isdigit()== True and astr[i+1].isdigit()==False:~~

~~nums.append(num1)~~

~~num1 = ''~~

~~elif astr[i].isdigit() == False and astr[i+1].isdigit() == True:~~

~~num1 = num1 + astr[i+1]~~

~~elif astr[i].isdigit() == True and astr[i+1].isdigit() ==True:~~

~~num1 = num1 + astr[i+1]~~

~~if astr[0].isdigit():~~

~~nums[0] = astr[0] + nums[0]~~

~~print(nums)~~

~~return nums~~

~~Split_num_letters('你知道120乘20等于多少吗')~~

~~Split_num_letters('120乘20等于多少吗')~~

~~结果：该函数配合其他函数使用，即可实现语音识别进行简单运算~~

~~['120', '20']~~

~~进程已结束，退出代码为 0~~

~~四、学会控制智能家电~~

~~基本原理:在arduino配置好代码，连接好电路，通过python向arduino在一定情况下传输某个信号，arduino在接收到指定信号时，执行指定动作。~~

~~1.环境配置:~~

~~pip install pyserial~~

~~2.实现python控制arduino~~

~~这里可以参考我的另一篇文章：~~

~~python与arduino通讯(windows和linux)_Leonard2021的博客-CSDN博客_树莓派和arduino通讯~~

这样来实现python与arduino的交互，arduino可以控制众多的电器，比如：灯，舵机，风扇等，通过舵机的旋转实现可以开关门，打开关闭各种大型电器的电闸等等；

arduino的众多传感器配件也可以为智能语音系统提供相关数据，比如：空气湿度，温度等，让它能更好的来控制相关的电器，让实现智能家居一体化控制。

~~有很大的想象空间和发展空间，这里我仅仅实现通过语音控制arduino自带的LED灯的开关，其他的控制只需要以相同的原理配置即可。~~

~~a.python的代码：~~

~~import serial # 导入串口通讯库~~

~~import time~~

~~def try2():~~

~~ser = serial.Serial("COM3", 9600, timeout=1)~~

~~c = ''~~

~~while 1:~~

~~wakeup_co() #语音唤醒~~

~~rec() #将语音转化为wav文件~~

~~listenword = listen() #将wav文件中的语音转化为中文文字~~

~~#这三个都是上面给出了的，需要自己命名后，导入~~

~~if '灯' in listenword and '开' in listenword:~~

~~c = '1'~~

~~elif '灯' in listenword and '关' in listenword:~~

~~c = '0'~~

~~if (c == '0'):~~

~~ser.write('0'.encode('utf-8'))~~

~~if (c == '1'):~~

~~ser.write('1'.encode('utf-8'))~~

~~try2()~~

~~b.arduino的代码：~~

~~void setup(){~~

~~Serial.begin(9600);~~

~~pinMode(13,OUTPUT);//设置13号端口作为输出端口~~

~~//digitalWrite(13,HIGH);~~

}

~~char var;~~

~~void loop(){~~

~~while(Serial.available()>0)~~

{

~~var=Serial.read();~~

~~if(var == '0'){~~

~~digitalWrite(13,LOW);~~

}

~~if(var== '1'){~~

~~digitalWrite(13,HIGH);~~

}

~~五、学会说~~

~~使用pyttsx3实现了文字转语音，让智能终端学会“说”~~

~~1.环境配置：~~

~~pip install pyttsx3~~

~~2.代码实现~~

~~import pyttsx3~~

~~def speakout(workText):~~

~~# 初始化语音~~

~~engine = pyttsx3.init() # 初始化语音库~~

~~# 设置语速~~

~~rate = engine.getProperty('rate')~~

~~engine.setProperty('rate', rate - 50)~~

~~# 输出语音~~

~~engine.say(workText) # 合成语音~~

~~engine.runAndWait()~~

~~speakout('你好')~~

~~测试结果：听到一个有点别扭的女声，说了句“你好”。这里也可以自行调节参数，让语音听起来更舒服一点。~~

~~最后所有的源码~~

~~所有的源码：链接：https://pan.baidu.com/s/14667bZyc9xovUyUH_zf-CQ?pwd=lief 提取码：lief –来自百度网盘超级会员V6的分享~~

其他

~~我在运行python wakeup_COCO.py 还安装了：~~

~~相关文章~~

~~评论可见，请评论后查看内容，谢谢！！！~~

~~您阅读本篇文章共花了：~~

金钥匙

人工智能【语音识别】落地实现--离线智能语音助手

人工智能随机森林机器学习超参数优化算法（贝叶斯优化）

语音识别人工智能【SpeechX—统一高性能语音部署工具】SpeechX Architecture

发表评论取消回复

金钥匙

人工智能 【语音识别】落地实现--离线智能语音助手

人工智能 随机森林 机器学习超参数优化算法（贝叶斯优化）

语音识别 人工智能 【SpeechX—统一高性能语音部署工具】SpeechX Architecture

相关文章

发表评论取消回复

人工智能【语音识别】落地实现--离线智能语音助手

人工智能随机森林机器学习超参数优化算法（贝叶斯优化）

语音识别人工智能【SpeechX—统一高性能语音部署工具】SpeechX Architecture