外呼系统之 MRCP,原来只是封装一个抽象层

  • MRCP介绍
  • 进入正文
  • 总结


MRCP介绍

媒体资源控制协议(Media Resource Control Protocol, MRCP)是一种通讯协议,用于语音服务器向客户端提供各种语音服务(如语音识别和语音合成)。
本文将讲解外呼系统对接 mrcp(如讯飞 mrcp),首先介绍一下 mrcp:
MRCP 的基本架构,其中,在服务器端支持了很多媒体资源。媒体资源则包括了各种媒体类型。MRCP定义了六种媒体资源类型,它们分别是:
basicsynth,支持基本的语音合成
speechsynth ,支持标准的语音合成
dtmfrecog,支持DTMF识别
speechrecog,支持语音识别
recorder,支持语音录音
speakverify,讲话人验证,声纹匹配

从定义类型就能看出 MRCP 的作用也就很明显了:充分利用SIP协议的优势,非常完美地解决管理媒体和控制会话的问题。从SIP协议的角度来看,它管理的话会话属性本身不是最重要的,它更侧重于对媒体资源的定位,提供一个整合功能。因为 SIP 协议提供的媒体资源服务器查询服务,MRCP 客户端可以获得关于媒体资源的支持能力。

我们可以理解一下,其实mrcp-server就是实现了,sip协议的,以及rtp协议,将自己作为一个sipserver,freeswitch 作为 client,进行连接,同时 freeswitch 将自己的 rtp 流实时传给 mrcp-server。
说人话,mrcp 协议其实就是再搭建一个中间件,来做语音识别(asr)、语音合成(tts)工作,因此需要 mrcp-server 与 freeswitch 进行交互,freeswitch 与真正的用户进行交互。

进入正文

本文使用 media bug 进行媒体监听的方式,搭建基于讯飞的 adk、以及 unimrcp-server 与 freeswitch 的对接。

我们先参考网上已有资料,点击这里 这是目前 github 上 star 最多的中文 mrcp 对接 freeswitch,但里面有一些坑还需要优化处理。

一、mrcp-server 部署搭建,编译依赖后执行下面命令

./configure --with-apr=/opt/mrcp/MRCP-Plugin-Demo-master/unimrcp-deps-1.5.0/libs/apr --with-apr-util=/opt/mrcp/MRCP-Plugin-Demo-master/unimrcp-deps-1.5.0/libs/apr-util

其中后面 --with-apr 是我安装的目录和 代码文件对应的目录,不然可能编译好久都无法成功。

二、配置mrcp-server 和 freeswitch 配置
freeswitch 你需要配置两个地方,
a、你需要新增一个 conf/mrcp_profiles/unimrcpserver-mrcp-v2.xml

<include>
 <profile name="unimrcpserver-mrcp-v2" version="2">
   <param name="client-ip" value="127.0.0.1"/>
   <param name="client-port" value="9060"/>
   <param name="server-ip" value="192.168.0.190"/>
   <param name="server-port" value="8060"/>
   <param name="sip-transport" value="udp"/>
   <param name="rtp-ip" value="192.168.0.190"/>
   <param name="rtp-port-min" value="4000"/>
   <param name="rtp-port-max" value="5000"/>
   <param name="codecs" value="PCMU PCMA L16/96/8000"/>
    <param name="speechsynth" value="speechsynthesizer"/>
    <param name="speechrecog" value="speechrecognizer"/>
   <synthparams>
   </synthparams>
   <recogparams>
       <param name="start-input-timers" value="false"/>
   </recogparams>
 </profile>
</include>

我上面介绍的,client 是 freeswitch sip uas,所以你根据你自己的进行配置ip和sip端口

server是你的mrcp-server 配置的sip地址

三、下载训练对应的sdk

注意要勾选两个,一个是语音听写(asr)、一个是语音合成(tts)

替换对应的目录,plugins/third-party/xfyun,记得需要修改代码里面的appkey

不然tts识别时候就会遇到问题:

MRCP/2.0 83 1 200 IN-PROGRESS
Channel-Identifier: ede2ac36452811ec@speechsynth

2021-11-14 16:57:43:392587 [WARN]   [xfyun] 正在合成 ...
2021-11-14 16:57:43:543463 [WARN]   [xfyun] QTTSAudioGet failed, error code: 10407.
2021-11-14 16:57:43:551022 [INFO]   Process SPEAK-COMPLETE Event <ede2ac36452811ec@speechsynth> [1]
2021-11-14 16:57:43:551051 [NOTICE] State Transition SPEAKING -> IDLE <ede2ac36452811ec@speechsynth>
2021-11-14 16:57:43:551089 [INFO]   Send MRCPv2 Data 192.168.0.190:1544 <-> 192.168.0.190:41578 [122 bytes]
MRCP/2.0 122 SPEAK-COMPLETE 1 COMPLETE
Channel-Identifier: ede2ac36452811ec@speec

10407,就是权限问题,所以记得替换 xfyun_login 方法里面 appid= 后面的星号,其他不用改。
紧接着会发现下面的问题

2021-11-14 16:41:52:936062 [NOTICE] Create RTP Termination Factory 192.168.0.190:[5000,6000]
2021-11-14 16:41:52:936073 [INFO]   Register RTP Termination Factory [RTP-Factory-1]
2021-11-14 16:41:52:936086 [INFO]   Load Plugin [XFyun-Recog-1] [/usr/local/unimrcp/plugin/xfyunrecog.so]
2021-11-14 16:41:52:936626 [WARN]   Failed to Load DSO: /lib/libmsc.so: undefined symbol: _ZTVN10__cxxabiv117__class_type_infoE
2021-11-14 16:41:52:936653 [INFO]   Load Plugin [XFyun-Synth-1] [/usr/local/unimrcp/plugin/xfyunsynth.so]
2021-11-14 16:41:52:937044 [WARN]   Failed to Load DSO: /lib/libmsc.so: undefined symbol: _ZTVN10__cxxabiv117__class_type_infoE
2021-11-14 16:41:52:937076 [INFO]   Register RTP Settings [RTP-Settings-1]

项目挺好的,不知道为啥,明明有人反应了问题,但是还是没有人更新代码,难道是半开源【开玩笑】

这个就是需要在Makefile文件加上 -lstdc++

./third-party/xfyun/samples/sch_translate_sample/Makefile:18:LDFLAGS += -lmsc -lrt -ldl -lpthread -lstdc++
./third-party/xfyun/samples/iat_online_sample/Makefile:18:LDFLAGS += -lmsc -lrt -ldl -lpthread -lstdc++
./third-party/xfyun/samples/ise_online_sample/Makefile:18:LDFLAGS += -lmsc -lrt -ldl -lpthread -lstdc++
./third-party/xfyun/samples/tts_online_sample/Makefile:18:LDFLAGS += -lmsc -lrt -ldl -lpthread -lstdc++
./xfyun-recog/xfyunrecog.la:20:dependency_libs=' -L../../plugins/third-party/xfyun/libs/x64 -lmsc -ldl -lpthread -lrt -lstdc++'
./xfyun-recog/Makefile.in:359:                              -lmsc -ldl -lpthread -lrt -lstdc++
./xfyun-recog/Makefile:359:                              -lmsc -ldl -lpthread -lrt -lstdc++
./xfyun-recog/.libs/xfyunrecog.lai:20:dependency_libs=' -L../../plugins/third-party/xfyun/libs/x64 -lmsc -ldl -lpthread -lrt -lstdc++'
./xfyun-recog/Makefile.am:8:                              -lmsc -ldl -lpthread -lrt -lstdc++

配置完成后,启动应该看不见报错了。

写上路由: 不用lua,用python

<extension name="mrcq_demo">
     <condition field="destination_number" expression="^5001$">
                          <action application="set" data="RECORD_TITLE=Recording ${destination_number} ${caller_id_number} ${strftime(%Y-%m-%d %H:%M)}"/>
                          <action application="set" data="RECORD_COPYRIGHT=(c) 2011"/>
                          <action application="set" data="RECORD_SOFTWARE=FreeSWITCH"/>
                          <action application="set" data="RECORD_ARTIST=FreeSWITCH"/>
                          <action application="set" data="RECORD_COMMENT=FreeSWITCH"/>
                          <action application="set" data="RECORD_DATE=${strftime(%Y-%m-%d %H:%M)}"/>
                          <action application="set" data="RECORD_STEREO=true"/>
                          <action application="record_session" data="$${base_dir}/recordings/archive/${strftime(%Y-%m-%d-%H-%M-%S)}_${destination_number}_${caller_id_number}_${call_uuid}.wav"/>
        <action application="answer"/>
        <action application="sleep" data="2000"/>
                <action application="python" data="mrcp"/>
     </condition>
   </extension>
#encoding=utf-8
from freeswitch import *
def handler1(session, args):
    call_addr='user/1018'
    session.execute("bridge", call_addr)

def handler(session, args):
    #uuid = "ggg"
    #console_log("1", "... test from my python program\n")
    #session = PySession(uuid)
    session.answer()
    session.set_tts_params("unimrcp", "xiaofang")
    session.speak("你好啊,我爱你,中国,哎你你,爱你�")
    #session.execute()
    session.execute("play_and_detect_speech", "say:please say yes or no. please say no or yes. please say something! detect:unimrcp {start-input-timers=false,no-input-timeout=5000,recognition-timeout=5000}builtin:grammar/boolean?language=en-US;y=1;n=2")

    session.hangup()

简单的demo体验就ok了。

优化一下:做一个简单的交互机器人,python代码如下:

#encoding=utf-8
import json
import tempfile
import requests
import xml.etree.ElementTree as ET
import freeswitch as fs
from freeswitch import *


# `UNI_ENGINE`: unimrcp engine
# In Python, `+` is optional for quoted string concatenation, ^_^
UNI_ENGINE = 'detect:unimrcp {start-input-timers=false,' \
        'no-input-timeout=5000,recognition-timeout=5000}'
# this will be ignored by baidu ASR, and `chat-empty` is also available
UNI_GRAMMAR = 'builtin:grammar/boolean?language=en-US;y=1;n=2'

def asr2text(result):
    """fetch recognized text from asr result (xml)"""
    root = ET.fromstring(result)
    node = root.find('.//input[@mode="speech"]')
    text = None
    if node is not None and node.text:
        # node.text is unicode
        text = node.text.encode('utf-8')
    return text

def handler1(session, args):
    call_addr='user/1018'
    session.execute("bridge", call_addr)

def handler(session, args):
    fs.consoleLog('info', '>>> start chatbot service')
    #uuid = "ggg"
    #console_log("1", "... test from my python program\n")
    #session = PySession(uuid)
    session.answer()

    # first 请求proxy-第一句应该返回什么内容,
    answer_sound = Synthesizer()('你好啊,baby。')

    while session.ready():
        # here, we play anser sound and detect user input in a loop
        session.execute('play_and_detect_speech',
                answer_sound + UNI_ENGINE + UNI_GRAMMAR)
        asr_result =  session.getVariable('detect_speech_result')
        if asr_result is None:
            # if result is None, it means session closed or timeout
            fs.consoleLog('CRIT', '>>> ASR NONE')
            break
        try:
            text = asr2text(asr_result)
        except Exception as e:
            fs.consoleLog('CRIT', '>>> ASR result parse failed \n%s' % e)
            continue
        fs.consoleLog('CRIT', '>>> ASR result is %s' % text)
        # len will get correct length with unicode
        if text is None or len(unicode(text, encoding='utf-8')) < 2:
            fs.consoleLog('CRIT', '>>> ASR result TOO SHORT')
            # answer_sound = sound_query('inaudible')
            answer_sound = Synthesizer()('不好意思,我没有听清您的话,请再说一次。')
            continue
        # chat with robot
        # text = Robot()(text)
        fs.consoleLog('CRIT', 'Robot result is %s' % text)
        if not text:
            text = '不好意思,我刚才迷失在人生的道路上了。请问您还需要什么帮助?'
        # speech synthesis
        answer_sound = Synthesizer()(text)
    
    # session close
    fs.msleep(800)
    session.hangup(1)
    # session.set_tts_params("unimrcp", "xiaofang")
    # session.speak("你好啊,我爱你,中国,哎你你,爱你�")
    #session.playFile("/path/to/your.mp3", "")
    #session.speak("Please enter telephone number with area code and press pound sign. ")
    #input = session.getDigits("", 11, "*#", "#", 10000)
    # session.hangup(1)



class Synthesizer:

    def __init__(self):
        self.audiofile = tempfile.NamedTemporaryFile(prefix='session_', suffix='.wav')

    def __call__(self, text):
        if isinstance(text, unicode):
            text = text.encode('utf-8')
        audio = requests.post("http://127.0.0.1:8001/tts_text", files=dict(text=(None, text))).content
        import uuid
        name = str(uuid.uuid1())
        filename = "/tmp/" + name
        with open(filename, "wb") as file:
            file.write(audio.decode())
        return filename

使用自己训练的离线语音合成模型 (tts)

总结

希望大家通过本文能提高对 MRCP 的意义和作用能有所了解,对对接 MRCP 能有所掌握,我们也有自研的全套 FreeSwitch、ASR、TTS 等能力,已经全部更新到私有化部署版本,安全快捷。后续我会对各电话外呼中心,电话网络基建进行更多的详尽解释,喜欢可以关注我~有问题可以留言或私信我