常见的语音格式介绍

  • PCM:音频纯裸数据。
  • WAV:微软在windows存储的一种纯裸数据格式。
  • AIFF:苹果在Mac上存储的一种纯裸数据格式。
  • MP3:为ISO/IEC国际标准,是现在最普及的一种数字音频编码和有损压缩格式,几乎所有的终端和软件都支持此格式。mp3既是一种封装格式,又是一种音频编解码类型。
  • G.711:是一种由国际电信联盟(ITU-T)制定的音频编码方式,又称为ITU-T G.711。它是国际电信联盟ITU-T定制出来的一套语音压缩标准,它代表了对数PCM(logarithmic pulse-code modulation)抽样标准,主要用于电话、在安防监控领域也长涉及。
  • AAC:(Advanced Audio Coding),中文名:高级音频编码,出现于1997年,基于MPEG-2的音频编码技术。在互联网直播场景下经常用于RTMP协议中传输音频。
  • OPUS:是一个有损声音编码的格式,由Xiph.Org基金会开发,之后由IETF(互联网工程任务组)进行标准化,目标是希望用单一格式包含声音和语音,取代Speex和Vorbis,且适用于网络上低延迟的即时声音传输,标准格式定义于RFC 6716文件。在webrtc中常用。

主要的音频参数介绍

  • 采样率:8000 Hz(8K Hz)、16000 Hz (16K Hz),即每秒8000个或16000个采样点。
  • 声道:Mono单声道;Stereo立体声。

声道名称

Mono单声道

Stereo立体声

2.1

4.0

4.1

5.1

7.1

声道数量

1

2

3

4

5

6

8



  • 采样位数:如16bit表示每个采样点的音频信息用16 bit(2个字节)保存。取值范围8bit、16 bit、32bit,有符号、无符号、浮点、double等。

从Adobe Audition的截图看,各参数的展示:




iOS音频编解码 苹果音频解码格式_pcm


语音时长与文件大小转换:语音文件Size大小(单位MiB)=

(采样率×采样位数×声道数×语音时长(单位s))/(8*1024*1024)=16000(Hz)*16(bit)*1(声道)*60(s)/(8*1024*1024)=1.83 MiB(近似值)

对应ffmpeg中的主要结构体

  • sample_rate:对应采样率
# libavutil/frame.h
/**
 * Sample rate of the audio data.
 */
int sample_rate;
  • nb_samples:定义在frame.h中,表示一帧frame中包含的采样点数。

举例:如音频采样率16k单声道,每10ms传输一帧frame,那么该frame中nb_samples=16000/1000ms*10ms=160个采样点。

# libavutil/frame.h
/**
 * number of audio samples (per channel) described by this frame
 */
int nb_samples;
  • nb_channels:对应声道数
* @param nb_channels   the number of channels
  • AVSampleFormat :对应采样位数(位深度),根据数据类型、占用bit位数、存储方式三个方面命名。

举例:AV_SAMPLE_FMT_S32P,其中后缀S32P的意思是:signed格式 、32 bits位数、planar模式。

# libavutil/samplefmt.h
enum AVSampleFormat {
    AV_SAMPLE_FMT_NONE = -1,
    AV_SAMPLE_FMT_U8,          ///< unsigned 8 bits
    AV_SAMPLE_FMT_S16,         ///< signed 16 bits
    AV_SAMPLE_FMT_S32,         ///< signed 32 bits
    AV_SAMPLE_FMT_FLT,         ///< float
    AV_SAMPLE_FMT_DBL,         ///< double

    AV_SAMPLE_FMT_U8P,         ///< unsigned 8 bits, planar
    AV_SAMPLE_FMT_S16P,        ///< signed 16 bits, planar
    AV_SAMPLE_FMT_S32P,        ///< signed 32 bits, planar
    AV_SAMPLE_FMT_FLTP,        ///< float, planar
    AV_SAMPLE_FMT_DBLP,        ///< double, planar
    AV_SAMPLE_FMT_S64,         ///< signed 64 bits
    AV_SAMPLE_FMT_S64P,        ///< signed 64 bits, planar

    AV_SAMPLE_FMT_NB           ///< Number of sample formats. DO NOT USE if linking dynamically
};

# libavutil/samplefmt.c
/** this table gives more information about formats */
static const SampleFmtInfo sample_fmt_info[AV_SAMPLE_FMT_NB] = {
    [AV_SAMPLE_FMT_U8] = {.name = "u8",
                          .bits = 8,
                          .planar = 0,
                          .altform = AV_SAMPLE_FMT_U8P},
    [AV_SAMPLE_FMT_S16] = {.name = "s16",
                           .bits = 16,
                           .planar = 0,
                           .altform = AV_SAMPLE_FMT_S16P},
    [AV_SAMPLE_FMT_S32] = {.name = "s32",
                           .bits = 32,
                           .planar = 0,
                           .altform = AV_SAMPLE_FMT_S32P},
    [AV_SAMPLE_FMT_S64] = {.name = "s64",
                           .bits = 64,
                           .planar = 0,
                           .altform = AV_SAMPLE_FMT_S64P},
    [AV_SAMPLE_FMT_FLT] = {.name = "flt",
                           .bits = 32,
                           .planar = 0,
                           .altform = AV_SAMPLE_FMT_FLTP},
    [AV_SAMPLE_FMT_DBL] = {.name = "dbl",
                           .bits = 64,
                           .planar = 0,
                           .altform = AV_SAMPLE_FMT_DBLP},
    [AV_SAMPLE_FMT_U8P] = {.name = "u8p",
                           .bits = 8,
                           .planar = 1,
                           .altform = AV_SAMPLE_FMT_U8},
    [AV_SAMPLE_FMT_S16P] = {.name = "s16p",
                            .bits = 16,
                            .planar = 1,
                            .altform = AV_SAMPLE_FMT_S16},
    [AV_SAMPLE_FMT_S32P] = {.name = "s32p",
                            .bits = 32,
                            .planar = 1,
                            .altform = AV_SAMPLE_FMT_S32},
    [AV_SAMPLE_FMT_S64P] = {.name = "s64p",
                            .bits = 64,
                            .planar = 1,
                            .altform = AV_SAMPLE_FMT_S64},
    [AV_SAMPLE_FMT_FLTP] = {.name = "fltp",
                            .bits = 32,
                            .planar = 1,
                            .altform = AV_SAMPLE_FMT_FLT},
    [AV_SAMPLE_FMT_DBLP] = {.name = "dblp",
                            .bits = 64,
                            .planar = 1,
                            .altform = AV_SAMPLE_FMT_DBL},
};
参考文档:

https://www.jianshu.com/p/85d083fb2615