机器人语音交互之ros集成科大讯飞中文语音库,实现语音控制机器人小车

1 背景和资料2 正文2.1 下载科大讯飞语音库2.2 robot_voice 之语音控制机器人小车移动样例

3 总结

1 背景和资料

从本文开始,我们将用两篇文章学习机器人语音交互。本文作为第一篇,将在ros上集成科大讯飞的中文语音库,实现语音控制机器人小车运动。至于语音识别和语音合成的原理,本文并不深究,读者可以自行搜索相关的文章介绍。这里提醒,本文的测试环境是ubuntu20.04 + ros noetic。 本文参考资料如下: (1)《ROS机器人开发实践》胡春旭 第8章 (2)讯飞语音识别和唤醒开发示例 (3)讯飞语音听写 Linux SDK 文档 (4)ROS高效入门第二章 – 基本概念和常用命令学习,基于小乌龟样例 (5)ROS高效进阶第三章 – 以差速轮式机器人为例,使用Gazebo构建机器人仿真平台

2 正文

2.1 下载科大讯飞语音库

(1)首先登陆讯飞开放平台:讯飞开放平台 ,注册后,点击控制台进入。 (2)然后创建应用并下载linux sdk,更具体的操作可以参考:讯飞语音识别和唤醒开发示例 (3)最后得到自己专属的sdk,如我本人的:Linux_iat1227_tts_online1227_bb839ccf.zip,其中 bb839ccf 是专属bb839ccf。下面我们将把这套 sdk 集成到 robot_voice 样例中,这里不对这个 zip 包内容进行展开讲解。

2.2 robot_voice 之语音控制机器人小车移动样例

(1)robot_voice 样例,我们将实现两个应用,第一个就是本文的语音控制机器人小车移动,拓扑图如下:

#mermaid-svg-ZkROx2iiCZ7632LG {font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}#mermaid-svg-ZkROx2iiCZ7632LG .error-icon{fill:#552222;}#mermaid-svg-ZkROx2iiCZ7632LG .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-ZkROx2iiCZ7632LG .edge-thickness-normal{stroke-width:2px;}#mermaid-svg-ZkROx2iiCZ7632LG .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-ZkROx2iiCZ7632LG .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-ZkROx2iiCZ7632LG .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-ZkROx2iiCZ7632LG .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-ZkROx2iiCZ7632LG .marker{fill:#333333;stroke:#333333;}#mermaid-svg-ZkROx2iiCZ7632LG .marker.cross{stroke:#333333;}#mermaid-svg-ZkROx2iiCZ7632LG svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-ZkROx2iiCZ7632LG .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-ZkROx2iiCZ7632LG .cluster-label text{fill:#333;}#mermaid-svg-ZkROx2iiCZ7632LG .cluster-label span{color:#333;}#mermaid-svg-ZkROx2iiCZ7632LG .label text,#mermaid-svg-ZkROx2iiCZ7632LG span{fill:#333;color:#333;}#mermaid-svg-ZkROx2iiCZ7632LG .node rect,#mermaid-svg-ZkROx2iiCZ7632LG .node circle,#mermaid-svg-ZkROx2iiCZ7632LG .node ellipse,#mermaid-svg-ZkROx2iiCZ7632LG .node polygon,#mermaid-svg-ZkROx2iiCZ7632LG .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-ZkROx2iiCZ7632LG .node .label{text-align:center;}#mermaid-svg-ZkROx2iiCZ7632LG .node.clickable{cursor:pointer;}#mermaid-svg-ZkROx2iiCZ7632LG .arrowheadPath{fill:#333333;}#mermaid-svg-ZkROx2iiCZ7632LG .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-ZkROx2iiCZ7632LG .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-ZkROx2iiCZ7632LG .edgeLabel{background-color:#e8e8e8;text-align:center;}#mermaid-svg-ZkROx2iiCZ7632LG .edgeLabel rect{opacity:0.5;background-color:#e8e8e8;fill:#e8e8e8;}#mermaid-svg-ZkROx2iiCZ7632LG .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-ZkROx2iiCZ7632LG .cluster text{fill:#333;}#mermaid-svg-ZkROx2iiCZ7632LG .cluster span{color:#333;}#mermaid-svg-ZkROx2iiCZ7632LG div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-ZkROx2iiCZ7632LG :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;}

voice_detector

robot_controller

voice_creator

mbot_gazebo

voice_detector:负责语音识别,将语音转换为文字,并作为 client,通过 human_chatter 服务,发给 robot_controller 。 robot_controller:作为 human_chatter 服务 server,接收 voice_detector 发来的文字化的指令,并生成对应的语音播报文字和控车命令。前者通过 str2voice 服务,发给 voice_creator,后者通过 /cmd_vel topic,发给 mbot_gazebo。 voice_creator:作为 str2voice 服务server,接收 robot_controller 发来的语音播报文字,合成语音文件并播放。 mbot_gazebo:作为机器人小车,接收 /cmd_vel topic,并调整运动状态。 补充:关于ros的服务机制,可以参考本人ROS高效入门博客第二章的2.6节: ROS高效入门第二章 – 基本概念和常用命令学习,基于小乌龟样例 (2)安装环境:

unzip Linux_iat1227_tts_online1227_bb839ccf.zip

sudo cp libs/x64/libmsc.so /usr/lib/

sudo apt update

sudo apt install sox

sudo apt install libsox-fmt-all

SoX, 全称 Sound eXchange,被官方称为 “the Swiss Army knife of audio manipulation”。 它是一个强大的用于转换和处理声音文件的库。因其操作简单且功能强大,广泛应用在音频数据的处理和分析中。 除此之外,本人在编译讯飞样例时,遇到了:

linuxrec.c:12:10: fatal error: alsa/asoundlib.h: No such file or directory

解决方式是:

sudo apt-get install libasound2-dev

ALSA,全称Advanced Linux Sound Architecture (ALSA) 库,用于处理音频设备。 (3)创建 robot_voice 及相关文件

cd ~/catkin_ws/src

catkin_create_pkg robot_voice roscpp rospy std_msgs geometry_msgs message_generation message_runtime

cd robot_voice

mkdir srv launch

touch srv/StringToVoice.srv launch/voice_control_robot.launch

touch src/voice_detector.cpp src/robot_controller.cpp src/voice_creator.cpp

mkdir ifly_voice include/ifly_voice

请将 Linux_iat1227_tts_online1227_bb839ccf.zip 中的相关文件,分别移入相关目录,供编译使用,如下图: (4)voice_detector.cpp

#include

#include

#include

#include

#include

#include

#include "ifly_voice/qisr.h"

#include "ifly_voice/qtts.h"

#include "ifly_voice/msp_cmn.h"

#include "ifly_voice/formats.h"

#include "ifly_voice/msp_errors.h"

#include "ifly_voice/speech_recognizer.h"

#include

#include

class Helper {

public:

static void SignalHandler(int signal) {

ROS_INFO("\nCaught signal %d. Exiting gracefully...\n", signal);

exit(0);

}

};

class VoiceDetector {

public:

VoiceDetector() {

ROS_INFO("VoiceDetector Constructor");

}

~VoiceDetector() {

ROS_INFO("VoiceDetector Destructor");

}

int Init() {

int ret = MSP_SUCCESS;

ret = MSPLogin(NULL, NULL, login_params_.c_str());

if (MSP_SUCCESS != ret) {

ROS_ERROR("MSPLogin failed , Error code %d", ret);

MSPLogout(); // Logout...

return -1;

}

ROS_INFO("VoiceDetector MSP Login for update, waiting for seconds...");

return 0;

}

static void JoinTxt(const char *result, char is_last) {

if (result) {

std::string slice_txt = result;

VoiceDetector::voice_txt_ += slice_txt;

}

if (is_last) {

printf("voice txt : %s\n", VoiceDetector::voice_txt_.c_str());

}

}

static void InitSpeech() {

VoiceDetector::voice_txt_ = "";

printf("clear cache, start Listening...\n");

}

static void EndSpeech(int reason) {

if (reason == END_REASON_VAD_DETECT) {

printf("\nSpeaking done \n");

} else {

printf("\nRecognizer error %d\n", reason);

}

}

int SpeechOnce() {

int ret;

int i = 0;

struct speech_rec iat;

struct speech_rec_notifier recnotifier = {

JoinTxt,

InitSpeech,

EndSpeech

};

ret = sr_init(&iat, session_begin_params_.c_str(), SR_MIC, &recnotifier);

if (ret) {

ROS_ERROR("speech recognizer init failed");

return -1;

}

ret = sr_start_listening(&iat);

if (ret) {

printf("start listen failed %d\n", ret);

}

/* demo 15 seconds recording */

sleep(10);

ret = sr_stop_listening(&iat);

if (ret) {

printf("stop listening failed %d\n", ret);

}

sr_uninit(&iat);

return 0;

}

static std::string get_voice_txt_() {

return voice_txt_;

}

private:

const std::string login_params_ = "appid = bb839ccf, work_dir = .";

const std::string session_begin_params_ =

"sub = iat, domain = iat, language = zh_cn, "

"accent = mandarin, sample_rate = 16000, "

"result_type = plain, result_encoding = utf8";

const uint32_t BufferSize = 4096;

uint64_t g_buffersize = BufferSize;

static std::string voice_txt_;

};

std::string VoiceDetector::voice_txt_ = "";

int main(int argc, char* argv[]) {

int ret = 0;

ros::init(argc, argv, "voice_detector");

ros::NodeHandle nh;

// 创建 human_chatter 服务client

ros::ServiceClient client_ = nh.serviceClient("human_chatter");

if (signal(SIGINT, Helper::SignalHandler) == SIG_ERR) {

return -1;

}

VoiceDetector vd;

ret = vd.Init();

if (ret < 0) {

return -1;

}

while (1) {

// 一次聆听

ret = vd.SpeechOnce();

if (ret < 0) {

return -1;

}

// 获取当次聆听得到的内容

std::string voice_txt = VoiceDetector::get_voice_txt_();

if (voice_txt == "") {

printf("voice_txt is empty, do not send chatter\n");

continue;

} else if (voice_txt.find("结束") != std::string::npos) {

break;

}

// 通过 human_chatter 服务,发给robot_controller,处理成功后,进入下一轮

robot_voice::StringToVoice::Request req;

robot_voice::StringToVoice::Response resp;

req.data = voice_txt;

bool ok = client_.call(req, resp);

if (ok) {

printf("send human_chatter service success: %s\n", req.data.c_str());

} else {

printf("failed to send human_chatter service\n");

}

}

ros::spin();

return 0;

}

(5)robot_controller.cpp

#include

#include

#include

#include

#include

#include

#include

#include

#include

class RobotController {

public:

RobotController() {

ROS_INFO("RobotController Constructor");

}

~RobotController() {

ROS_INFO("RobotController Destructor");

}

int Init(ros::NodeHandle& nh) {

cmd_pub_ = nh.advertise("/cmd_vel", 1000);

client_ = nh.serviceClient("str2voice");

return 0;

}

void ToDownstream(const std::string& answer_txt, float linear_x, float angular_z) {

// 通过 str2voice 服务和 /cmd_vel topic向下游 voice_creator 和 mbot_gazebo 发送

robot_voice::StringToVoice::Request req;

robot_voice::StringToVoice::Response resp;

req.data = answer_txt;

bool ok = client_.call(req, resp);

if (ok) {

printf("send str2voice service success: %s, and pub cmd_vel\n", req.data.c_str());

geometry_msgs::Twist msg;

msg.linear.x = linear_x;

msg.angular.z = angular_z;

cmd_pub_.publish(msg);

} else {

ROS_ERROR("failed to send str2voice service");

}

}

bool ChatterCallbback(robot_voice::StringToVoice::Request &req, robot_voice::StringToVoice::Response &resp) {

printf("i received: %s\n", req.data.c_str());

std::string voice_txt = req.data;

// 根据指令关键字,发送对应的语音播包文字和 cmd_vel 命令

if (voice_txt.find("前") != std::string::npos) {

ToDownstream("小车请向前跑", 0.3, 0);

} else if (voice_txt.find("后") != std::string::npos) {

ToDownstream("小车请向后倒", -0.3, 0);

} else if (voice_txt.find("左") != std::string::npos) {

ToDownstream("小车请向左转", 0, 0.3);

} else if (voice_txt.find("右") != std::string::npos) {

ToDownstream("小车请向右转", 0, -0.3);

} else if (voice_txt.find("转") != std::string::npos) {

ToDownstream("小车请打转", 0.3, -0.3);

}

resp.success = true;

return resp.success;

}

void Start(ros::NodeHandle& nh) {

// 申明 human_chatter 服务,ChatterCallbback是回调函数

chatter_server_ = nh.advertiseService("human_chatter", &RobotController::ChatterCallbback, this);

}

private:

ros::ServiceServer chatter_server_;

ros::Publisher cmd_pub_;

ros::ServiceClient client_;

};

int main(int argc, char* argv[]) {

int ret = 0;

ros::init(argc, argv, "voice_controller");

ros::NodeHandle nh;

RobotController rc;

rc.Init(nh);

printf("this is a voice controller app for robot, you can say: 向前, 向后, 向左, 向右, 转圈, 结束\n");

rc.Start(nh);

ros::spin();

return 0;

}

(6)voice_creator.cpp

#include

#include

#include

#include

#include

#include

#include "ifly_voice/qisr.h"

#include "ifly_voice/qtts.h"

#include "ifly_voice/msp_cmn.h"

#include "ifly_voice/formats.h"

#include "ifly_voice/msp_errors.h"

#include "ifly_voice/speech_recognizer.h"

#include

class Helper {

public:

static void SignalHandler(int signal) {

ROS_INFO("\nCaught signal %d. Exiting gracefully...\n", signal);

exit(0);

}

};

class VoiceCreator {

public:

VoiceCreator() {

ROS_INFO("VoiceCreator Constructor");

}

~VoiceCreator() {

ROS_INFO("VoiceCreator Destructor ");

}

int Init() {

int ret = MSP_SUCCESS;

ret = MSPLogin(NULL, NULL, login_params_.c_str());

if (MSP_SUCCESS != ret) {

ROS_ERROR("MSPLogin failed , Error code %d", ret);

MSPLogout(); // Logout...

return -1;

}

ROS_INFO("VoiceCreator MSP Login for update, waiting for seconds...");

return 0;

}

int ProcessTxt(std::string& txt) {

int ret = -1;

FILE* fp = NULL;

const char* sessionID = NULL;

unsigned int audio_len = 0;

int synth_status = MSP_TTS_FLAG_STILL_HAVE_DATA;

WavePcmHdr_t wav_hdr = {

{ 'R', 'I', 'F', 'F' },

0,

{'W', 'A', 'V', 'E'},

{'f', 'm', 't', ' '},

16,

1,

1,

16000,

32000,

2,

16,

{'d', 'a', 't', 'a'},

0

};

const char* src_text = txt.c_str();

const char* des_path = filename_.c_str();

const char* params = session_begin_params_.c_str();

if (NULL == src_text || NULL == des_path) {

ROS_ERROR("params is error!");

return ret;

}

fp = fopen(des_path, "wb");

if (NULL == fp) {

ROS_ERROR("open %s error", des_path);

return ret;

}

/* 开始合成 */

sessionID = QTTSSessionBegin(params, &ret);

if (MSP_SUCCESS != ret) {

ROS_ERROR("QTTSSessionBegin failed, error code: %d", ret);

fclose(fp);

return ret;

}

ret = QTTSTextPut(sessionID, src_text, (unsigned int)strlen(src_text), NULL);

if (MSP_SUCCESS != ret) {

ROS_ERROR("QTTSTextPut failed, error code: %d",ret);

QTTSSessionEnd(sessionID, "TextPutError");

fclose(fp);

return ret;

}

printf("正在合成 ...\n");

fwrite(&wav_hdr, sizeof(wav_hdr) ,1, fp); //添加wav音频头,使用采样率为16000

while (1) {

/* 获取合成音频 */

const void* data = QTTSAudioGet(sessionID, &audio_len, &synth_status, &ret);

if (MSP_SUCCESS != ret) {

break;

}

if (NULL != data) {

fwrite(data, audio_len, 1, fp);

wav_hdr.data_size += audio_len; //计算data_size大小

}

if (MSP_TTS_FLAG_DATA_END == synth_status) {

break;

}

printf(">");

usleep(150*1000); //防止频繁占用CPU

}

printf("\n");

if (MSP_SUCCESS != ret) {

ROS_ERROR("QTTSAudioGet failed, error code: %d",ret);

QTTSSessionEnd(sessionID, "AudioGetError");

fclose(fp);

return ret;

}

/* 修正wav文件头数据的大小 */

wav_hdr.size_8 += wav_hdr.data_size + (sizeof(wav_hdr) - 8);

/* 将修正过的数据写回文件头部,音频文件为wav格式 */

fseek(fp, 4, 0);

fwrite(&wav_hdr.size_8,sizeof(wav_hdr.size_8), 1, fp); //写入size_8的值

fseek(fp, 40, 0); //将文件指针偏移到存储data_size值的位置

fwrite(&wav_hdr.data_size,sizeof(wav_hdr.data_size), 1, fp); //写入data_size的值

fclose(fp);

fp = NULL;

/* 合成完毕 */

ret = QTTSSessionEnd(sessionID, "Normal");

if (MSP_SUCCESS != ret) {

ROS_ERROR("QTTSSessionEnd failed, error code: %d", ret);

return ret;

}

// 播放语音文件

fp = popen(play_cmd_.c_str(),"r");

if (fp == NULL) {

ROS_ERROR("play /tmp/tts_sample.wav failed");

return -1;

}

sleep(1);

pclose(fp);

return 0;

}

bool Speeking(robot_voice::StringToVoice::Request &req, robot_voice::StringToVoice::Response &resp) {

int ret = -1;

ret = ProcessTxt(req.data);

if (MSP_SUCCESS != ret) {

ROS_ERROR("AnswerVoice failed, error code: %d", ret);

resp.success = false;

return false;

} else {

resp.success = true;

}

return resp.success;

}

void Start(ros::NodeHandle& nh) {

// 申明 str2voice 服务

server_ = nh.advertiseService("str2voice", &VoiceCreator::Speeking, this);

}

private:

ros::ServiceServer server_;

const std::string login_params_ = "appid = bb839ccf, work_dir = .";

const std::string session_begin_params_ =

"voice_name = xiaoyan, text_encoding = utf8, "

"sample_rate = 16000, speed = 50, volume = 50, "

"pitch = 50, rdn = 2";

//合成的语音文件名称

const std::string filename_ = "/tmp/tts_sample.wav";

//语音播放命令

const std::string play_cmd_ = "play /tmp/tts_sample.wav";

/* wav音频头部格式 */

typedef struct WavePcmHdr {

char riff[4]; // = "RIFF"

int size_8; // = FileSize - 8

char wave[4]; // = "WAVE"

char fmt[4]; // = "fmt "

int fmt_size; // = 下一个结构体的大小 : 16

short int format_tag; // = PCM : 1

short int channels; // = 通道数 : 1

int samples_per_sec; // = 采样率 : 8000 | 6000 | 11025 | 16000

int avg_bytes_per_sec; // = 每秒字节数 : samples_per_sec * bits_per_sample / 8

short int block_align; // = 每采样点字节数 : wBitsPerSample / 8

short int bits_per_sample; // = 量化比特数: 8 | 16

char data[4]; // = "data";

int data_size; // = 纯数据长度 : FileSize - 44

} WavePcmHdr_t;

};

int main(int argc, char ** argv) {

int ret = 0;

ros::init(argc, argv, "voice_creator");

ros::NodeHandle nh;

if (signal(SIGINT, Helper::SignalHandler) == SIG_ERR) {

return -1;

}

VoiceCreator vc;

ret = vc.Init();

if (ret < 0) {

return -1;

}

vc.Start(nh);

ros::spin();

return 0;

}

(6)StringToVoice.srv , voice_control_robot.launch 和 CMakeLists.txt StringToVoice.srv

string data

---

bool success

voice_control_robot.launch

pkg="robot_voice"

type="voice_creator"

name="voice_creator"

output="screen"

/>

pkg="robot_voice"

type="robot_controller"

name="robot_controller"

output="screen"

/>

pkg="robot_voice"

type="voice_detector"

name="voice_detector"

launch-prefix="bash -c 'sleep 5; $0 $@'"

output="screen"

/>

CMakeLists.txt

cmake_minimum_required(VERSION 3.0.2)

project(robot_voice)

add_compile_options(-std=c++11)

find_package(catkin REQUIRED COMPONENTS

roscpp

rospy

std_msgs

geometry_msgs

message_generation

)

add_service_files(

FILES

StringToVoice.srv

)

generate_messages(

DEPENDENCIES

std_msgs

)

catkin_package(

CATKIN_DEPENDS message_runtime roscpp rospy std_msgs

)

include_directories(

include

${catkin_INCLUDE_DIRS}

)

add_executable(voice_detector

src/voice_detector.cpp

ifly_voice/speech_recognizer.c

ifly_voice/linuxrec.c)

add_executable(robot_controller src/robot_controller.cpp)

add_executable(voice_creator src/voice_creator.cpp)

add_dependencies(voice_detector ${PROJECT_NAME}_generate_messages_cpp)

target_link_libraries(voice_detector

${catkin_LIBRARIES}

libmsc.so -ldl -lpthread -lm -lrt -lasound

)

add_dependencies(robot_controller ${PROJECT_NAME}_generate_messages_cpp)

target_link_libraries(robot_controller

${catkin_LIBRARIES}

)

add_dependencies(voice_creator ${PROJECT_NAME}_generate_messages_cpp)

target_link_libraries(voice_creator

${catkin_LIBRARIES}

libmsc.so -ldl -pthread

)

(7)编译并运行(运行时请注意电脑网络通畅!)

cd ~/catkin_ws/

catkin_make -DCATKIN_WHITELIST_PACKAGES="robot_voice;mbot_gazebo"

source devel/setup.bash

roslaunch mbot_gazebo view_mbot_gazebo.launch

// 再开一个窗口

source devel/setup.bash

roslaunch robot_voice voice_control_robot.launch

语音控制机器人

(8)在开发调试过程中,出现了如下编译报错:

internal compiler error: Illegal instruction

不得已,更新了gcc版本,问题解决

sudo apt-get install gcc-10

sudo apt-get install g++-10

cd /usr/bin

sudo rm gcc g++

sudo ln -s gcc-10 gcc

sudo ln -s g++-10 g++

3 总结

本文的样例托管在本人的github上:robot_voice

文章来源

评论可见,请评论后查看内容,谢谢!!!
 您阅读本篇文章共花了: