网站首页 > 人工智能 > 正文

语音识别之百度语音试用和OpenAiGPT开源Whisper使用

华为harmonyos 人工智能 2024-04-08 19 0

0.前言: 本文作者亲自使用了百度云语音识别,腾讯云,java的SpeechRecognition语言识别包和OpenAI近期免费开源的语言识别Whisper(真香警告)介绍了常见的语言识别实现原理

1.NLP 自然语言处理(人类语言处理) 你好不同人说出来是不同的信号表示

单位k 16k=16000个数字表示 1秒16000个数字(向量)表示声音

图 a a1

2.处理的类别

audition-->text

audition-->audition

class-->audition(hey siri)

3.深度学习带来语言的问题一定几率合成错误

发财发财发财

发财发财 //语气又不一样

发财 //只有发

语言分割(两个人同时说话) (电信诈骗)语气声调模仿

4.怎么辨识

word 一拳超人一拳超人一拳超人

personal computer

morpheme 根 unbreakable的break

bytes 不同语言按01标识, language independent

grapheme

5.常用的模型

LAS 提取范围feature decoder->attention 相邻信息差不多,不能事实翻译CTC sequence to sequence 可实时输出图ctc 好null好null棒棒>棒–>好棒要自己制作label null null好棒好 null好棒RNN-T sequence to sequence 如果前面结果满意就处理next 图rnnt/1 解决自己train的label,窗口移动做范围attention MoChA window 大小动态的变化HMM: 过去没有深度学习的解决方案 ,phoneme 发音为单位猜概率,tri-phone : what do you –>do发音受what和you影响预测下一个的几率图hmm1 图ctc 图hmm

6.深度学习使用到模型上

Tandem 09年满大街, 得到训练的语音概率,再放到模型运行 DNN-HMM HyBrid 2019(google IBM 5%错误率)主流 DNN(使用一个文件)可以训练

对比图(not gen代表没有路径可以抵达)

7.js可以使用语音识别(调用google aip,国内被封需要科学上网) //真香,不过(科学上网,再开个node服务器)公司使用会不会有纷争就不知道了

语音识别示例

使用SpeechRecognition 没有中文包,识别英文全是oh

9.百度云语音识别(能识别就是没有说话的时候出现奇奇怪怪的句子) 免费半年还挺好的,腾讯云只有5000次调用试用

https://console.bce.baidu.com/ai/#/ai/speech/app/list

//图baidu //识别语音的文件,controller只需要得到io流放到byte数据就可以识别,我觉得每次生成一个pcm应该就不会出现下图的识别识别的情况

import java.io.File;

import java.io.FileInputStream;

import java.util.HashMap;

import com.baidu.aip.speech.AipSpeech;

import org.json.JSONObject;

public class test01 {

// 在百度 AI 平台创建应用后获得

private static final String APP_ID = "xxxx";

private static final String API_KEY = "xxxx";

private static final String SECRET_KEY = "xxxxx";

public static void main(String[] args) throws Exception {

// 初始化 AipSpeech 客户端

AipSpeech client = new AipSpeech(APP_ID, API_KEY, SECRET_KEY);

// 设置请求参数

HashMap options = new HashMap();

options.put("dev_pid", 1537); // 普通话(支持简单的英文识别)

// 读取音频文件

File file = new File("path/to/audio/file.pcm");

FileInputStream fis = new FileInputStream(file);

byte[] data = new byte[(int) file.length()];

fis.read(data);

fis.close();

// 调用语音识别 API

JSONObject result = client.asr(data, "pcm", 16000, options);

if (result.getInt("err_no") == 0) {

String text = result.getJSONArray("result").getString(0);

System.out.println("识别结果：" + text);

} else {

System.out.println("识别失败：" + result.getString("err_msg"));

}

//实时录音测试 //图baidu

//优化需要像图片处理一样,直接上传文件而不是流

import java.util.HashMap;

import javax.sound.sampled.*;

import com.baidu.aip.speech.AipSpeech;

import org.json.JSONObject;

public class test01 {

// 在百度 AI 平台创建应用后获得

private static final String APP_ID = "xxxxxxx";

private static final String API_KEY = "xxxxxx";

private static final String SECRET_KEY = "xxxxxx";

public static void main(String[] args) throws Exception {

// 初始化 AipSpeech 客户端

AipSpeech client = new AipSpeech(APP_ID, API_KEY, SECRET_KEY);

// 设置请求参数

HashMap options = new HashMap();

options.put("dev_pid", 1537); // 普通话(支持简单的英文识别)

// 获取麦克风录制的音频流

AudioFormat format = new AudioFormat(16000, 16, 1, true, false);

TargetDataLine line = AudioSystem.getTargetDataLine(format);

line.open(format);

line.start();

// 创建缓冲区读取音频数据

int bufferSize = (int) format.getSampleRate() * format.getFrameSize();

byte[] buffer = new byte[bufferSize];

// 循环读取并识别音频数据

while (true) {

int count = line.read(buffer, 0, buffer.length);

if (count > 0) {

// 调用语音识别 API

JSONObject result = client.asr(buffer, "pcm", 16000, options);

if (result.getInt("err_no") == 0) {

String text = result.getJSONArray("result").getString(0);

System.out.println("识别结果：" + text);

} else {

System.out.println("识别失败：" + result.getString("err_msg"));

}

10.腾讯云语音识别 5000条免费,读者可以自己下载项目看看

//控制台

https://console.cloud.tencent.com/asr#

//项目地址

https://github.com/TencentCloud/tencentcloud-speech-sdk-java

11.使用whisper(2022年9月21日开源的,openAI格局真的大,腾讯云实时识别都要1个小时2块钱不过也不贵,但是对于大多数公司来说要压缩成本,嵌入式也有tiny版本的模型来使用)

安装python3.10

pip3 install torch torchvision torchaudio

2.powershell安装coco和ffmpeg

Set-ExecutionPolicy Bypass -Scope Process -Force; [System.Net.ServicePointManager]::SecurityProtocol = [System.Net.ServicePointManager]::SecurityProtocol -bor 3072; iex ((New-Object System.Net.WebClient).DownloadString('https://community.chocolatey.org/install.ps1'))

//切换阿里源,找不到ffmpeg(专门来处理音频的)如果不安装就找不到路径和文件

choco source add --name=aliyun-choco-source --source=https://mirrors.aliyun.com/chocolatey/

choco source set --name="'aliyun-choco-source'"

choco source list

choco install ffmpeg

2.测试速度挺快的,用小一点的模型岂不是慢一定可以通过准确又快速的半实时语言识别!!!

whisper test1.mp4

结果

金钥匙

语音识别之百度语音试用和OpenAiGPT开源Whisper使用

语音识别示例

亚马逊 python 网络爬虫编程新时代：Amazon CodeWhisperer 助您轻松驾驭代码世界

1024程序员节 kotlin 【Android】MQTT入门——服务器部署与客户端搭建

发表评论取消回复

金钥匙

语音识别之百度语音试用和OpenAiGPT开源Whisper使用

语音识别示例

亚马逊 python 网络爬虫 编程新时代：Amazon CodeWhisperer 助您轻松驾驭代码世界

1024程序员节 kotlin 【Android】MQTT入门——服务器部署与客户端搭建

相关文章

发表评论取消回复

亚马逊 python 网络爬虫编程新时代：Amazon CodeWhisperer 助您轻松驾驭代码世界